kenny-adam / searchproject Goto Github PK

View Code? Open in Web Editor NEW

1.0 2.0 1.0 587 KB

COP2805 Team Search Project

Java 100.00%

searchproject's Introduction

SearchProject

COP2805 Team Search Project

searchproject's People

Contributors

Stargazers

Watchers

Forkers

lloydee

searchproject's Issues

'rebuild index' button not implemented

button should perform method that scans files listed in index and re-indexes them

OR search not implemented

OR Search

This is the easiest one to implement. The general idea is to start with an empty Set of matching files. Then add to that Set, the files containing each search term; Just search the Map for that word, and add each document found (if any). The result is the OR search results, the files that contain any word in the search list. (If user inputs no search words, say “ ,.”, then no files are considered as matching.)

'remove file' button not implemented

button should allow user to select one or more files in list and remove them from index

check files on startup for changes

when the program starts it should run a method to check indexed files to see if they have been deleted or if their last changed time is different than the one currently listed in the index.

if there are changes then the maintenance panel should be shown so the user can re-build the index.

PHRASE search not implemented

PHRASE Search

This is the hardest search to implement. Unlike the OR and the AND searches, with PHRASE searching, the position of the search terms in the files matters. The algorithm I came up with is:

Create an initially empty Set of Pair objects.
Add to the set the Pair objects for the files that contain the first word of the phrase. This is the easy part: Just lookup that word in the Map, and add all Pair objects found to a set.
The Set now contains Pair objects for just the files that might contain the phrase. Next, loop over the remaining words of the phrase, removing any Pairs from the set that are no longer possible phrase continuations. (Actually, I just build a new Set.)

For each remaining word in the phrase:
Create a new, empty set of Pairs.
For each Pair in the previous set, see if the word appears in the same file, but in the next position. If so, add the Pair object for the word to the new set.
An example may help clarify this. Suppose the search phrase is “big top now”. The set initially contains all the Pair objects for the word “big”. Let's say for example, that set looks like:

(file1,position7), (file1,position22), (file3,position4)
For each Pair object in that set, you need to see if “top” is in that same file, but the next position. If so, you add the Pair object for that to the new Set. The (inner) loop for this example checks each of the following:

Is a (file1,position8) Pair object in the Map for the word "top"?

Is a (file1,position23) Pair object in the Map for the word "top"?

Is a (file3,position5) Pair object in the Map for the word "top"?
If the answer is “yes”, then add that Pair object to the new set. When this loop ends, the new set will contain the Pair objects for the phrase “big top” (pointing to the position of the word “top”).

For example, suppose “top” is only found in (file1,position8) and (file3,position5). You replace the first set with this new set:

(file1,position8), (file3,position5)
Repeat for the next word in the phrase, using the set built in the previous loop.
Continue until the set is empty (so phrase not found), or until the last word of the phrase has been processed. The Pair objects remaining in the final set are the ones that contain the phrase; the position will be that of the last word of the phrase. (We only need to display the file name; in this project, the position of the phrase doesn't matter.)

test data is not present

about 5 plain text files should be created with sample sentences inside so we can test the search functions on the data inside these files

inverted index file not created

file should be plain text document with a sample format such as this:

[program name] [version number]
[total number of indexed files]

[index number] [indexed file pathname] [last modification time]
[index number] [indexed file pathname] [last modification time]
[index number] [indexed file pathname] [last modification time]

[word] [index number],[location] [index number],[location]
[word] [index number],[location] [index number],[location]
[word] [index number],[location] [index number],[location]
[word] [index number],[location] [index number],[location]
[word] [index number],[location] [index number],[location]

AND search not implemented

AND Search

This is done the opposite way from an OR search, and is only a little harder to implement. The idea is to start with a set of all files in the index. Then for each search term, for each file in the Set, make sure that file is contained in the index for that search term. Remove any files from the set that don't contain that word. The resulting final set is the documents matching all search terms. (If user inputs no search words, say “ ,.”, then all files are considered as matching. If that isn't the behavior you want, you need to treat that as a special case.)

inverted index format not documented

a separate file from the index should be created that describes the file format in great detail so that someone can create their own search program to read the data inside the inverted index.

'add file' button not implemented

button should allow user to select file on filesystem and add it to index

kenny-adam / searchproject Goto Github PK

searchproject's Introduction

SearchProject

searchproject's People

Contributors

Stargazers

Watchers

Forkers

searchproject's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs