GithubHelp home page GithubHelp logo

nyplmenus's Introduction

CREATED: 10/14/12
MOST RECENT EDIT: 10/20/12
NYPLmenus is a class for accessing the API for the New York Public Library's What's on the Menu? project. The module also includes classes for accessing recipe names from the recipes section of Wikibooks and Allrecipes.com.

The classes use the Requests library to access the data. See http://docs.python-requests.org/en/latest/ for more information. 

Class NYPLmenus:
See http://menus.nypl.org/ for more information on the WotM project itself, and http://nypl.github.com/menus-api/ for more information on the API and how to get an access token.

The class MUST be initialized with an access token. http://nypl.github.com/menus-api/ has information on how to obtain one. The __init__ is written to allow the token to either be inserted when the class is initialized, or directly into the code itself. (See the comment on line 10.)

The main method is get_menus, which accesses http://api.nypl.org/menus in one of two ways. If you supply the method with the id of a specific menu, it will retrieve that specific menu. Without a menu id, the method will retrieve menus page by page until the last page is reached, or your token's daily ratelimit is exceeded. There is also an optional max_pages parameter that, if provided, will cap the number of pages accessed.

Unless dishes is set to False, get_menus will call get_dishes for each menu accessed. Each call to get_dishes involves accessing the API and will count against your ratelimit, so if you're concerned about conserving your token's limit, and aren't concerned about dish-specific information, set dishes to False.

Menu information is stored in a dictionary, self.menus, in Unicode format. Dish information is stored as a sub-dictionary, self.menus[MENUNAME]["dishes"]. The keys of self.menus are the names of the menus, which are either the name of the restaurant and the date of the menu (in MMDDYYYY format) separated by an underscore, if the restaurant name is available, or the menu id. If the restaurant name is available but the date isn't given, the date will be NONE. Thus, potential keys are "Zen Palate_03191999", "Applebee's_NONE", or a five-digit menu ID number. Bear in mind that these keys are generated dynamically, and could change as the publicly-available data is updated. 

Each menu is stored in self.menus as a dictionary. The information stored in the dictionaries, and their keys, are as follows:
	"location" : The name of the restaurant.
	"date" : The date of the menu.
	"currency" : The currency of the menu.
	"dishes_link" : The url of the menu's dishes. This is used by get_dishes.
	"event" : The event associated with the menu.
	"id" : The menu's 5-digit id.
	"lang" : The language the menu is written in.
	"place" : The place where the menu was created. NOTE: at this point, "place" seems absent for many of the menus. Furthermore, there doesn't seem to be any consistent differentiation between "place" and "location". This could very well change as the data is filled out and normalized.
	"dishes" : The menu's dishes. See below for further information.
	
"dishes" is a dictionary itself. Its keys are the names of the dishes, its values are dictionaries with the following keys and values:
	"high" : The highest noted price of the dish.
	"low" : The lowest noted price of the dish.
	"price" : The price of the dish as given on the menu.
	"latest" : The time and date of the latest annotation of the dish, formatted YYYY-MM-DDTHH:mm:msZ

Thus, the price of a dish is stored as self.menus[MENUNAME]["dishes"][DISHNAME]["price"].
NOTE: Dish names are automatically extracted, and can be quite long, as they sometimes include descriptions of the dish. Ideally, this will be addressed in the near future.


Class Wikirecipes:
This class accesses the Recipes category of Wikibooks. Unlike the NYPLmenus class, the Wikirecipes and Allrecipes classes don't require access tokens.

The main method is get_recipes, which goes page by page through Wikibooks' recipes and adds the recipe names to self.recipes. NOTE: because there is often content overlap between pages, self.recipes is a set.


Class Allrecipes:
This class accesses Allrecipes.com and retrieves recipes course-by-course. The list of courses accessed can be found in the left sidebar of http://allrecipes.com/recipes/main.aspx (NOTE: recipes are retrieved only from the pages listed under "COURSES." The pages listed under "INGREDIENTS AND METHODS" and "OCCASIONS AND COOKING STYLES" are not accessed; it is assumed these pages contain mostly the same recipes found under "COURSES," just organized in a different way.)

There are several methods defined within this class:

get_courses retrieves the courses from the main recipes page of Allrecipes. The course names, and links to the pages that house them, are stored in self.courses. get_courses takes an optional variable parameter, *courses, which is the names (as strings) of one or more courses. If you supply course names to get_courses, only those specified course names will be retained in self.courses.

get_recipes retrieves recipe names from a supplied Allrecipes.com course page by page. get_recipes takes one mandatory argument, course, which is a tuple (coursename,courselink), where coursename is the name of the course (as a string), and courselink is the link to the (first) page for that course. self.courses is comprised of tuples formatted in exactly this way. get_recipes can also take two optional parameters. maxpage, if supplied, caps the number of pages retrieved for the supplied category at the value supplied. verbose is a boolean that, if True, causes get_recipes to print the category and page numbers as they're retrieved.

to_dict translates the results of functions that output (str,list) and translates them into a dictionary whose keys are the output strings and whose values are the output lists. to_dict takes a variable parameter *funcs, which is the names of the functions to be translated into the dictionary.

main brings all the above functions together, and could easily be the only method in Allrecipes you call. It calls get_courses, and then calls to_dict on a list of get_recipes instances called on self.courses. main takes an optional keyword parameter, **kwargs. **kwargs is passed to the get_recipes instances called on the courses in self.courses, and should thus consist of specified values for maxpage and verbose. See the documentation on get_recipes, above, for more information.

unescape is a method adapted from the original by Fredrik Lundh, adapted from the original at http://effbot.org/zone/re-sub.htm#unescape-html, via a suggestion from http://stackoverflow.com/questions/57708/convert-xml-html-entities-into-unicode-string-in-python. It takes text containing HTML-encoded characters (", e.g.) and renders them as regular text, or unicode as necessary. It takes the mandatory parameter text, which is the text to be converted.


There is also a "free-standing" method defined in the NYPLmenus module: html_pprint. This method takes a requests <response> object (see http://docs.python-requests.org/en/latest/ for more information) and turns its content into a list of strings. Each string is a line, as read by the <response> object's iter_lines method. html_pprint takes two optional parameters. decode_unicode is a boolean which is passed directly to the <response> item's iter_lines method: if True, unicode in the content of the <response> object is decoded according to the <response> object's iter_lines method. The other optional parameter pprint is a boolean that, if True, results in list of strings being printed out item-by-item, providing a pretty-printed, human-readable representation of the <response> object's content. If pprint is False, html_pprint will instead return the list of strings for further processing.

nyplmenus's People

Contributors

swizzard avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.