Monday, 4 November 2013

Paper summary : Information Sharing Across Private Databases

Recent trends (e.g., outsourcing) in the field of information integration highlight the importance of security when sharing information across autonomous DBs. It is essential that queries (which span multiple DBs) be answered in a terse manner i.e., no additional information apart from query results is leaked. Present solutions are inadequate (e.g., using third party middlemen, which requires a high amount of trust on these middlemen). Instead, the authors propose certain cryptographic protocols which are used to simulate trusted third parties. Protocols for 4 operations are explored. These protocols only permit a minimal necessary amount information to be shared for queries, assuming semi-honest parties.

The protocols support 4 operations- (i) intersection, (ii) equijoin, (iii) intersection size and (iv) equijoin size. For (i), a naïve protocol is proposed, which has the disadvantage that R (the receiver) can guess all additional information sent by S (the sender) if the domain is small. This can be fixed by using a pair of commutative functions (assuming ideal hashing). For (ii), yet another naïve protocol is proposed, which incorrectly encrypts extra information. This has the same weakness as that in (i) wrt the naïve protocol. To overcome this, the extra information is encrypted with a key, which can obtained by inverting a secure function. For (iii), the intersection protocol in (i) is modified so that R learns only about the intersection size, and not the intersection set (which would also result in the intersection size). For (iv), the protocol is similar to that in (iii). All the 4 protocols are all proven correct and secure assuming semi-honest parties (i.e., they disclose minimal information apart from the query result).

The authors acknowledge certain limitations of their protocols. Their protocols do not make any guarantees for multiple queries (which can be handled in other ways e.g. maintaining audit trails). They also assume that database schemas are known, and don't account for schema heterogeneity.

Finally, the authors present the theoretical efficiency of the proposed protocols and also perform estimated cost analysis for certain scenarios.

No comments:

Post a Comment