This is the project repository of Laura Legat's and Sojeong Lee's implementation of TF-IDF scoring of the Cranfield 1400 document collection.
- Pre-requisites: Have Python and numpy installed, either on your host machine or in for example a conda environment
- Clone this repository with
git clone https://github.com/Laura-Legat/csci4130-Cranfield-Collection.git
or download all files directly - On your system or in your conda environment (the same one with numpy installed), install the library for running Jupyter notebooks by running
pip install notebook
- Run the
proj2_final_LEGAT_LEE.ipynb
file either from your IDE or text editor, or by running the commandjupyter notebook
in a terminal in the root folder of this project. For the latter case, a browser window will open up, from which the .ipynb file can be selected and executed - Execute all cells and read the result from the last cell. It shows the result of executing 20 random queries. For manually executing specific cells, run the
run_query
function