This is an open-source program that enables semantic enterprise search for unstructured datasets stored in the cloud. If you are using this tool in your research projects, please cite the following paper:
@inproceedings{woodworth2016s3c,
title={S3C: An architecture for space-efficient semantic search over encrypted data in the cloud},
author={Woodworth, Jason and Salehi, Mohsen Amini and Raghavan, Vijay},
booktitle={Proceedings of the 4th IEEE International Conference on Big Data},
series={Big Data'16},
pages={3722--3731},
year={2016},
month={December}
}
The paper is also available in the following address: http://hpcclab.org/paperPdf/bigdata16/bigdata16.pdf
Below, you can find the steps to execute the program:
- Clone the repository:
git clone https://github.com/hpcclab/S3C.git
- Provide full access to the project folder: chmod a+x "/path/to/the/repository/folder/and/its contens. Example:
chmod a+x /home/[PATHTOFOLDER]/S3C/*
- Execute script
./exec.sh
to create folder + unzip demo dataset. - Check 'cloud' folder is created in the parent folder. That means script runs successfully.
- Open Eclipse or other Java IDE
- Import two project:
a. Open Project -> Path to S3C ->
SemanticSearchClient
b. Open Project -> Path to S3C ->SemanticSearchCloud
6.1. If you see an error in the client code, you need to add the "jsoup" jar file. For that purpose, in the project, find "lib" directory, right-click on jsoup-1.8.2.jar in that directory and click on "Build path" and choose "Add to Build Path"
-
Run cloud project -> (Main class: SemanticSearchCloud.java)
-
Run client project -> (Main class: SemanticSearchClient.java)
-
a. Type
-u
in client console. b. Provide the upload path. Typeinput/demo_dataset
. Our demo dataset is ready to be uploaded. -
After uploading, stop client execution. At this time, server has already built index and docSize.
-
Re-run client project. Type
-s
to search over the index. -
From the result, keep a note of the file (s) that you want to decrypt.
-
Go to Semantic Search Client -> uploads folder. Copy those files into
semantic Search Client -> data-> input_encrypted folder
-
Re-run Client project. Choose
-d
to decrypt resulted document. Insertfile_name.txt
-
After getting successful message, go to
semantic Search Client -> data-> output_decrypted folder
. Open the decrypted file to read the content!!
We welcome new features, extension, or enhancements of S3C.
We are, in particular, looking for new collaborations, taking this framework further. As extension of S3C, we have developed S3BD that is similar to S3C but can perform search over encrypted Big Data. In addition to this, we are also researching to extend the capabilities of S3C such as search query expansion, intelligent pruning, clustering, and so on. Please drop us an email if you are interested.