Fuzzy experiences very frustrating and results in low

Fuzzy Keyword Search over
Encrypted Data in Cloud Computing

 

Pranjuli Yavatkar, Nikita
Patil, Sneha Kale

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Department of Computer
Engineering, Bharati Vidyapeeth College of Engineering, Navi Mumbai,
Maharashtra

 

 

Abstract –
As Cloud Computing is highly dominating technology in recent days, entire
sensitive information is being stored onto the cloud. For maintaining data confidentiality,
sensitive data are generally encrypted, which makes effective data utilization
a very complex task. The Existing searchable encryption schemes provides a way
for secure search over encrypted data using keywords and retrieving the
necessary files. Whereas these techniques support only exact keyword search.
That is, there is no acceptance of slight typos and format inconsistencies
which are typical user searching behavior. Because of this drawback, the
existing techniques becomes incompatible in cloud computing, affecting the
system usability. This makes the user searching experiences very frustrating
and results in low system efficiency. This paper includes the formalization and
solution of the problem of effective fuzzy keyword search over encrypted cloud
data as well as preserving keyword privacy. Fuzzy keyword search helps to
enhance the system usability by generating the matching files when users’
searching inputs exactly match the predefined keywords or the closest possible
matching files based on keyword similarity semantics, when exact match fails.

 

KEYWORDS: Encryption, Fuzzy Keyword, Cloud Computing

 

I.                   
INTRODUCTION

 

As Cloud Computing
is highly dominating technology in recent days, entire sensitive information is
being stored onto the cloud, such as emails, health
records, government documents, personal data etc. By putting away their
information into the cloud, the data owners can be relieved from the burden of
data storage and maintenance so as to enjoy the on-demand high quality data
storage service. However, the cloud server may no longer be fully trusted. The sensitive
data usually should be encrypted prior to outsourcing for data privacy and
preventing unsolicited accesses. However, data encryption makes effective data
utilization a very challenging task given that there could be a large amount of
outsourced data files. Besides, in Cloud Computing, information owners may
share their outsourced information with countless. The individual users might
want to only retrieve certain specific data files they are interested in during
a given session. One of the most popular ways is to selectively retrieve files
through keyword-based search instead of retrieving all the encrypted files back
which is completely impractical in cloud computing scenarios. Such
keyword-based search technique allows users to selectively retrieve files of
interest and has been widely applied in plaintext search scenarios, such as
Google search. Unfortunately, data encryption restricts user’s ability to
perform keyword search and thus makes the traditional plaintext search methods
unsuitable for Cloud Computing. Besides this, data encryption also demands the
protection of keyword privacy since keywords usually contain important
information related to the data files. Although encryption of keywords can protect
keyword privacy, it further renders the traditional plaintext search techniques
useless in this scenario. To securely search over encrypted data, searchable
encryption techniques have been developed in recent years. Searchable
encryption schemes usually build up an index for each keyword of interest and
associate the index with the files that contain the keyword. By integrating the
trapdoors of keywords within the index information, effective keyword search
can be realized while both file content and keyword privacy are well-preserved.
Although allowing for performing searches securely and effectively, the
existing searchable encryption techniques do not suit for cloud computing
scenario since they support only exact keyword search. That is, there is no tolerance
of minor typos and format inconsistencies.

It is quite common that users’ searching input might
not exactly match those pre-set keywords due to the possible typos,
representation inconsistencies, and/or her lack of exact knowledge about the
data. The naive way to support fuzzy keyword search is through simple spell
check mechanisms. However, this approach does not completely solve the problem
and sometimes can be ineffective due to the following reasons: on the one hand,
it requires additional interaction of user to determine the correct word from
the candidates generated by the spell check algorithm, which unnecessarily
costs user’s extra computation effort; on the other hand, in case that user
accidentally types some other valid keywords by mistake (for example, search
for “hat” by carelessly typing “cat”), the spell check algorithm would not even
work at all, as it can never differentiate between two actual valid words.
Thus, the drawbacks of existing schemes signifies the important need for new techniques
that support searching flexibility, tolerating both minor typos and format
inconsistencies.

In this paper, we focus on enabling effective yet
privacy-preserving fuzzy keyword search in Cloud Computing. To the best of our
knowledge, we formalize for the first time the problem of effective fuzzy
keyword search over encrypted cloud data while maintaining keyword privacy.
Fuzzy keyword search greatly enhances system usability by returning the
matching files when users’ searching inputs exactly match the predefined
keywords or the closest possible matching files based on keyword similarity
semantics, when exact match fails. More specifically, we use edit distance to
quantify keywords similarity and develop a novel technique, i.e., a
wildcard-based technique, for the construction of fuzzy keyword sets. This
technique eliminates the need for enumerating all the fuzzy keywords and the
resulted size of the fuzzy keyword sets is significantly reduced. Based on the
constructed fuzzy keyword sets, we propose an efficient fuzzy keyword search
scheme.

 

II.                
RELATED WORK

 

Plaintext fuzzy
keyword search: Recently, the importance of fuzzy
search has received attention in the context of plaintext searching in
information retrieval community 11–13. They addressed this problem in the
traditional information access paradigm by allowing user to search without
using try-and-see approach for finding relevant information based on
approximate string matching. At the first glance, it seems possible for one to
directly apply these string matching algorithms to the context of searchable
encryption by computing the trapdoors on a character base within an alphabet.
However, this trivial construction suffers from the dictionary and statistics
attacks and fails to achieve the search privacy.

 

Searchable
encryption: Traditional searchable encryption has
been widely studied in the context of cryptography. Among those works, most are
focused on efficiency improvements and security definition formalizations. The
first construction of searchable encryption was proposed by Song et al., in
which each word in the document is encrypted independently under a special
two-layered encryption construction. Goh proposed to use Bloom filters to
construct the indexes for the data files. To achieve more efficient search,
Chang et al. and Curtmola et al. both proposed similar “index” approaches,
where a single encrypted hash table index is built for the entire file
collection. In the index table, each entry consists of the trapdoor of a
keyword and an encrypted set of file identifiers whose corresponding data files
contain the keyword. As a complementary approach, Boneh et al. presented a
public-key based searchable encryption scheme. Note that all these existing
schemes support only exact keyword search, and thus are not suitable for Cloud
Computing.

 

III.             
PROBLEM FORMULATION

 

A. System Model:

 

 

 

In this paper, we
consider a cloud data system consisting of data owner, data user and cloud
server. Given a collection of n encrypted data files C = (F1, F2,…, FN)
stored in the cloud server, a predefined set of distinct keywords W = {w1, w2,
…, wp}, the cloud server provides the search service for the authorized users
over the encrypted data C. We assume the authorization between the data owner
and users is appropriately done. An authorized user types in a request to
selectively retrieve data files of his/her interest. The cloud server is
responsible for mapping the searching request to a set of data files, where
each file is indexed by a file ID and linked to a set of keywords. The fuzzy
keyword search scheme returns the search results according to the following
rules: 1) if the user’s searching input exactly matches the pre-set keyword,
the server is expected to return the files containing the keyword1; 2) if there
exist typos and/or format inconsistencies in the searching input, the server
will return the closest possible results based on pre-specified similarity
semantics. An architecture of fuzzy keyword search is shown in the Fig. 1.

 

B.
Threat Model

 

We consider a
semi-trusted server. Even though data files are encrypted, the cloud server may
try to derive other sensitive information from users’ search requests while
performing keyword-based search over C. Thus, the search should be conducted in
a secure manner that allows data files to be securely retrieved while revealing
as little information as possible to the cloud server. More specifically, it is
required that nothing should be leaked from the remotely stored files and index
beyond the outcome and the pattern of search queries.

 

C.
Design Goals

 

In this paper, we
address the problem of supporting efficient yet privacy-preserving fuzzy
keyword search services over encrypted cloud data. Specifically, we have the
following goals: i) to explore new mechanism for constructing storage efficient
fuzzy keyword sets; ii) to design efficient and effective fuzzy search scheme
based on the constructed fuzzy keyword sets; iii) to validate the security of
the proposed scheme.

 

IV. CONSTRUCTIONS OF EFFECTIVE
FUZZY KEYWORD SEARCH IN CLOUD

 

The key idea behind
our secure fuzzy keyword search is two-fold: 1) building up fuzzy keyword sets
that incorporate not only the exact keywords but also the ones differing
slightly due to minor typos, format inconsistencies, etc.; 2) designing an
efficient and secure searching approach for file retrieval based on the resulted
fuzzy keyword sets.

 

A.
Advanced Technique for Constructing Fuzzy Keyword Sets

 

To provide more
practical and effective fuzzy keyword search constructions with regard to both
storage and search efficiency, we now propose an advanced technique to improve
the straightforward approach for constructing the fuzzy keyword set. Without
loss of generality, we will focus on the case of edit distance d = 1 to
elaborate the proposed advanced technique. For larger values of d, the
reasoning is similar. Note that the technique is carefully designed in such a
way that while suppressing the fuzzy keyword set, it will not affect the search
correctness.

 

Wildcard-based
Fuzzy Set Construction

 

In the above
straightforward approach, all the variants of the keywords have to be listed
even if an operation is performed at the same position. Based on the above
observation, we proposed to use a wildcard to denote edit operations at the
same position. The wildcard-based fuzzy set of wi with edit distance
d is denoted as S wi,d ={S’wi,0, S’wi,2, ··· ,
S’wi,d }, where S’ wi ,?  denotes the set of words wi with ? wildcards.
Note each wildcard represents an edit operation on wi. For example,
for the keyword CASTLE with the pre-set edit distance 1, its wildcard-based
fuzzy keyword set can be constructed as SCASTLE,1 = {CASTLE, *CASTLE, *ASTLE,
C*ASTLE, C*STLE, ··· , CASTL*E, CASTL*, CASTLE*}. The total number of variants
on CASTLE constructed in this way is only 13 + 1, instead of 13 × 26 + 1 as in
the above exhaustive enumeration approach when the edit distance is set to be
1. Generally, for a given keyword wi with length l,
the size of S wi,1 will be only 2 l +1+1,
as compared to (2 l + 1) × 26 + 1 obtained in the
straightforward approach. The larger the pre-set edit distance, the more
storage overhead can be reduced: with the same setting of the example in the straightforward
approach, the proposed technique can help reduce the storage of the index from
30GB to approximately 40MB. In case the edit distance is set to be 2 and 3, the
size of S wi,2 and S wi,3 will be C1 l +1+C1 l ·
C1 l
+2C2 l +2 and
C1 l
+ C3 l + 2C2 l +
2C2 l
· C1 l . In other words, the number is only O( l d)
for the keyword with length l
and edit distance d.

 

B.
The Efficient Fuzzy Keyword Search Scheme

 

Based on the
storage-efficient fuzzy keyword sets, we show how to construct an efficient and
effective fuzzy keyword search scheme. The scheme of the fuzzy keyword search
goes as follows: 1) To build an index for wi with edit distance d, the data
owner first constructs a fuzzy keyword set Swi,d using the wildcard
based technique. Then he computes trapdoor set {Tw’i} for each w’i
? S
wi,d with a secret key sk shared between data owner and authorized users.
The data owner encrypts FIDwi as Enc(sk, FIDwi  || wi). The index table {({ Tw’i
}w’i? S
wi,d , Enc(sk, FIDwi || wi ))} wi ?W and encrypted data files are
outsourced to the cloud server for storage;

2) To search with
(w, k), the authorized user computes the trapdoor set {Tw’}w’
? Sw,k , where Sw,k is
also derived from the wildcard-based fuzzy set construction. He then sends {Tw’}w’
? Sw,k to the server;

3) Upon receiving
the search request {Tw’}w’ ? Sw,k, the server
compares them with the index table and returns all the possible encrypted file
identifiers {Enc(sk, FIDwi || wi)} according to the fuzzy
keyword definition in section III-D. The user decrypts the returned results and
retrieves relevant files of interest.

In this
construction, the technique of constructing search request for w is the same as
the construction of index for a keyword. As a result, the search request is a
trapdoor set based on Sw,k , instead of a single trapdoor as in the
straightforward approach. In this way, the searching result correctness can be
ensured.

 

V. CONCLUSION

 

In this paper, we formalize and solve the problem of
supporting efficient yet privacy-preserving fuzzy search for achieving
effective utilization of remotely stored encrypted data in Cloud Computing. We
design an advanced technique (i.e., wildcard-based technique) to construct the
storage-efficient fuzzy keyword sets by exploiting a significant observation on
the similarity metric of edit distance. Based on the constructed fuzzy keyword
sets, we further propose an efficient fuzzy keyword search scheme. Through
rigorous security analysis, we show that our proposed solution is secure and
privacy-preserving, while correctly realizing the goal of fuzzy keyword search.

 

REFERENCES

1 D. Song, D. Wagner, and A. Perrig, “Practical
techniques for searches on encrypted data,” in Proc. of IEEE Symposium on
Security and Privacy’00, 2000.

2 E.-J. Goh, “Secure indexes,” Cryptology ePrint
Archive, Report 2003/216, 2003, http://eprint.iacr.org/.

3 Fuzzy keyword search over encrypted data in cloud
computing”, Illinois Institute of Technology, ISSN: 2321- 8134.

4 Y.-C. Chang and M. Mitzenmacher, “Privacy
preserving keyword searches on remote encrypted data,” in Proc. of ACNS’05,
2005.

5 D. Boneh, G. D. Crescenzo, R. Ostrovsky, and G.
Persiano, “Public key encryption with keyword search,” in Proc. of EUROCRYP’04,
2004.

6 R. Curtmola, J. A. Garay, S. Kamara, and R.
Ostrovsky, “Searchable symmetric encryption: improved definitions and efficient
constructions,” in Proc. of ACM CCS’06, 2006.

7 D. Boneh and B. Waters, “Conjunctive, subset, and
range queries on encrypted data,” in Proc. of TCC’07, 2007, pp. 535–554.

8 REVIEW PAPER ON FUZZY SEARCH OVER ENCRYPTED DATA
IN CLOUD COMPUTING by Neel Gala ISSN:2393-9842

9 F. Bao, R. Deng, X. Ding, and Y. Yang, “Private
query on encrypted data in multi-user settings,” in Proc. of ISPEC’08, 2008.

10 C. Li, J. Lu, and Y. Lu, “Efficient merging and
filtering algorithms for approximate string searches,” in Proc. of ICDE’08,
2008.