https://www.kaggle.com/datasets/vijayabhaskar96/tamil-news-classification-dataset-tamilmurasu
-
Minimum Python 3.9 version
-
Anaconda Environment
-
Install all the required Python packages in a conda environment
pip install -r requirements.txt
-
Create a /data folder and place the downloaded dataset inside it
-
Run the preprocessing/tamilmurasu_preprocessing.ipynb notebook to preprocess the dataset
-
To test the Adapted BERTopic model finetuned for this dataset, run BERTopic/BERTopic_final.ipynb
-
To test the comparison between different Tamil word embeddings for BERTopic and LDA, run the files under comparison folder
-
To test the LDA model, first run preprocessing/preprocessing_pipeline.ipynb notebook and then run comparsion/LDA.ipynb
-
To test the recommender system, run recommendation_system/recommender_system_experiments.ipynb