GithubHelp home page GithubHelp logo

searchproject's Introduction

SearchProject

COP2805 Team Search Project

searchproject's People

Contributors

dmyang21 avatar

Stargazers

Lloyd Jayson Pintac avatar

Watchers

Adam Kenny avatar  avatar

Forkers

lloydee

searchproject's Issues

OR search not implemented

OR Search

This is the easiest one to implement. The general idea is to start with an empty Set of matching files. Then add to that Set, the files containing each search term; Just search the Map for that word, and add each document found (if any). The result is the OR search results, the files that contain any word in the search list. (If user inputs no search words, say “ ,.”, then no files are considered as matching.)

check files on startup for changes

when the program starts it should run a method to check indexed files to see if they have been deleted or if their last changed time is different than the one currently listed in the index.

if there are changes then the maintenance panel should be shown so the user can re-build the index.

PHRASE search not implemented

PHRASE Search

This is the hardest search to implement. Unlike the OR and the AND searches, with PHRASE searching, the position of the search terms in the files matters. The algorithm I came up with is:

Create an initially empty Set of Pair objects.
Add to the set the Pair objects for the files that contain the first word of the phrase. This is the easy part: Just lookup that word in the Map, and add all Pair objects found to a set.
The Set now contains Pair objects for just the files that might contain the phrase. Next, loop over the remaining words of the phrase, removing any Pairs from the set that are no longer possible phrase continuations. (Actually, I just build a new Set.)

For each remaining word in the phrase:
Create a new, empty set of Pairs.
For each Pair in the previous set, see if the word appears in the same file, but in the next position. If so, add the Pair object for the word to the new set.
An example may help clarify this. Suppose the search phrase is “big top now”. The set initially contains all the Pair objects for the word “big”. Let's say for example, that set looks like:

(file1,position7), (file1,position22), (file3,position4)
For each Pair object in that set, you need to see if “top” is in that same file, but the next position. If so, you add the Pair object for that to the new Set. The (inner) loop for this example checks each of the following:

Is a (file1,position8) Pair object in the Map for the word "top"?

Is a (file1,position23) Pair object in the Map for the word "top"?

Is a (file3,position5) Pair object in the Map for the word "top"?
If the answer is “yes”, then add that Pair object to the new set. When this loop ends, the new set will contain the Pair objects for the phrase “big top” (pointing to the position of the word “top”).

For example, suppose “top” is only found in (file1,position8) and (file3,position5). You replace the first set with this new set:

(file1,position8), (file3,position5)
Repeat for the next word in the phrase, using the set built in the previous loop.
Continue until the set is empty (so phrase not found), or until the last word of the phrase has been processed. The Pair objects remaining in the final set are the ones that contain the phrase; the position will be that of the last word of the phrase. (We only need to display the file name; in this project, the position of the phrase doesn't matter.)

test data is not present

about 5 plain text files should be created with sample sentences inside so we can test the search functions on the data inside these files

inverted index file not created

file should be plain text document with a sample format such as this:

[program name] [version number]
[total number of indexed files]

[index number] [indexed file pathname] [last modification time]
[index number] [indexed file pathname] [last modification time]
[index number] [indexed file pathname] [last modification time]

[word] [index number],[location] [index number],[location]
[word] [index number],[location] [index number],[location]
[word] [index number],[location] [index number],[location]
[word] [index number],[location] [index number],[location]
[word] [index number],[location] [index number],[location]

AND search not implemented

AND Search

This is done the opposite way from an OR search, and is only a little harder to implement. The idea is to start with a set of all files in the index. Then for each search term, for each file in the Set, make sure that file is contained in the index for that search term. Remove any files from the set that don't contain that word. The resulting final set is the documents matching all search terms. (If user inputs no search words, say “ ,.”, then all files are considered as matching. If that isn't the behavior you want, you need to treat that as a special case.)

inverted index format not documented

a separate file from the index should be created that describes the file format in great detail so that someone can create their own search program to read the data inside the inverted index.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.