GithubHelp home page GithubHelp logo

morenobcn / cuisineclassifying Goto Github PK

View Code? Open in Web Editor NEW

This project forked from stevenjson/cuisineclassifying

0.0 0.0 0.0 19.71 MB

Automated Cuisine Classification of Recipes

License: MIT License

Python 0.01% HTML 99.99%

cuisineclassifying's Introduction

Automated Cuisine Classification of Recipes

Python Instructions:

What You Need to Install

Anaconda Python 3.5 - https://www.continuum.io/downloads

Python EditDistance package - https://pypi.python.org/pypi/editdistance

Scikit Learn - http://scikit-learn.org/stable/install.html

NLTK - http://www.nltk.org/install.html

Beautiful Soup 4 - https://pypi.python.org/pypi/beautifulsoup4

Organization

In the main directory, you will see the following files and directories:

  • Data/ - This directory hold the dataset and all of the feature data
  • Results/ - This directory holds the result files generated by Classifiers.py
  • MutInfo.py - This file calculates the pointwise mutual information of features
  • Classifiers.py - This file runs given features through the specified classifier and writes the output to the Results directory.

Other files and directories in the main directory include:

  • FileGather.py - This script downloads entire pages of recipes from allrecipes.com
  • html/ - This is the directory where FileGather.py stores the html files
  • RecipeScraper.py - This file converts the files in the html directory into the correct format and saves it in the Data directory

To Generate Data

To add recipes to the dataset, first you will need to run:

python FileGather.py [allrecipe cuisine url] [cuisine] [page number]

Example of allrecipe url: http://allrecipes.com/recipes/695/world-cuisine/asian/chinese/

Once you have downloaded all the recipes you want to add to the dataset you run

python RecipeScraper.py

This will generate all of the data and save it to the Data folder as [cuisineName].txt

To Use the Classifiers

To run the classifier you want on the dataset, simply use the following command:

python Classifiers.py [classifier you want] [feature you want]

This will run the classifiers on the dataset as folds. The script will go through

the different folds and calculate the accuracy. When the script is done it will

save the results to the Results directory.

To Get the Mutual Information

To find the n words with the most mutual information, use the following command

python MutInfo.py [feature] [n] [cuisine]

you can also use the tag 'all' in place of the cuisine name to get the top n from

the entire corpus.

cuisineclassifying's People

Contributors

stevenjson avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.