GithubHelp home page GithubHelp logo

akatasonov / recipe-importer Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 1.0 25 KB

Create an API endpoint that takes a single parameter url and returns JSON data representing the recipe of the passed parameter.

TypeScript 95.18% CSS 0.10% HTML 3.44% JavaScript 1.28%

recipe-importer's Introduction

Recipe Importer

Intro

Jumbo relies to a large degree on inspecting and interacting with the DOM. This exercise is meant to have a well-defined scope and yet be similar to the type of work we do on an every-day basis.

Exercise

While our JavaScript code runs on users' mobile devices, for the purpose of this exercise, we want you to expose this work as an API endpoint. Use any web-framework you wish.

Create an API endpoint that takes a single parameter URL and returns JSON data representing the recipe of the passed parameter.

Request

This is the request that your server should handle.

  • Method: GET
  • Path: /recipe
  • Parameters:
    • url: string Thus a fully formed URL would look similar to localhost:8080/recipe?url=http%3A%2F%2Fwww.google.com

Response

A JSON object with ingredients and steps lists based on the contents of the url page requested. Other parameters may be interesting to include.

Example:

{
  "name": "Barbecue Corn on the Cob",
  "ingredients": [
  {
    "name": "Corn",
    "quantity": 2
  },
  {
    "name": "Butter",
    "quantity": 2,
    "unit": "tbsp"
  }
  ],
  "steps": [
    "Preheat grill to 400°F",
    "Once heated, scrub to clean off any encrusted pieces",
    "Grill corn on high heat for 2 minutes",
    "..."
  ]
}

Notes

The endpoint should work with the following examples:

Solution

At a glance there are multiple ways to solve this problem.

Parsing HTML server-side

The most simple way to solve this exercise is to parse the HTML response of the web server as-is. For example we might use Cheerio.js to query the HTML in jQuery-like style or use htmlparser2. However we know that these days a significant portion of websites on the Internet require JavaScript, some won't even render anything if JavaScript is disabled. Obviously just parsing HTML response from a HTTP request doesn't look like a good solution and it is likely that even if it works now with the websites above it might stop working anytime because of lack of JS execution.

Executing a website on the server-side with a headless browser

This is a more elegant and flexible solution than the one above, since it allows any website to "render" (though render is not a correct term in a headless browser environment) as it would do in a normal desktop browser. This approach supports working with the HTML of any website on the Internet, independent of how complex a website is. The only drawback of this solution, comparing to the parsing HTML returned by the web server, is speed. A headless browser takes time (and CPU/memory resources) to download the HTML and all the page assets, run JavaScript (which in turn can lead to more assets being downloaded and parsed) and finally get a DOM representation of a webpage. Usually it takes several seconds, sometimes even longer than 10s (for webpages full of adware JS and tracking scripts) for the DOM to finalize. However, since we don't have any performance goals for this exercise, we're going to bear with it.

Ways to find the content we're looking for on a webpage

There are several ways we can solve the problem of figuring out where to look for specific content (recipe name, ingredients, cooking steps, etc) on a webpage:

  • use regular expressions to find matches
    • regular expressions are hard to read and can be quite complicated
    • this approach is fragile because regular expressions treat everything as a stream of text and they don't have any understanding of the DOM
  • use CSS selectors
    • simpler to write than regular expressions
    • however, are they flexible enough? CSS also doesn't capture the DOM structure
  • use XPath expressions
    • syntax is just a bit more complicated than CSS selectors, but much better than regular expressions
    • great flexibility, any node in an HTML/DOM document can be found using an XPath expression
    • captures relationships between elements in DOM fully

A quick summary

So lets sum it up: we're going to use a headless browser (Chromium) to process an URL and we're going to use XPath expressions, specific to each website, to find the content we're looking for. The expressions are stored in the config/default.json. The format is self explanatory, except the very first key which is either the full URL of a webpage or a partial URL (should always include at least the domain). It is highly likely that all cooking recipes on a website follow the same page structure. https://devhints.io/xpath is a nice cheatsheet

Launching

Clone the source code and run the following commands in the root of the project:

recipe-importer's People

Contributors

akatasonov avatar

Watchers

 avatar

Forkers

d-raj

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.