GithubHelp home page GithubHelp logo

juicebox-takehome's Introduction

image image image image

Overview

Basic organization

/news-takehome-api
  /models - shared models
  /logic - the bulk of the business logic for this
  /routes - where each endpoint/resource is defined
  /utils
  /vendors - API facades eg. wrappers for OpenAI and News API
/news-takehom-ui
  /src
    /api - API facade for BE
    /pages - top level components, if I had set up a Router, these would be the components attached to the router
    /components -small reusable components
    /constants

High-level approach

There are two endpoints:

GET /v1/news/top

POST /v1/news/content
{
  url
}

/v1/news/top powers the front listing page. This content is directly taken from the News API Top endpoint

/v1/news/content powers the named entity analysis page. At a high-level this is what it does:

  • fetch the raw HTML of a News API page (the content provided by News API is truncated)
  • use cheerio to grab the content within a <article> HTML tag (this is to reduce what we're sending over to an LLM)
  • we do an LLM call to extract the content out of the HTML
  • we do another LLM call to do named entity recognition against the content extracted from HTML
  • we serialize both the content and the named entities in a format more consumable by the Frontend to be rendered
    • see news-takehome-api/logic/content-serialization/contentAndEntitySerializer.test.js for more details

Setup

Running the API

cd ./news-takehome-api

Fill out .env-template and renamed to .env

Install deps with yarn

Run in dev mode with yarn dev

Running the FE

cd ./news-takehome-ui

Install deps with yarn

Run in dev mode with yarn start

Note: If you're running the BE on anything other than port 3001, you may need to update the proxy field in the FE package.json

Testing

I didn't get around to implementing tests, but there is a unit test at news-takehome-api/logic/content-serialization/contentAndEntitySerializer.test.js since this was pretty logic heavy

Known Bugs

Sometimes we aren't able to directly fetch HTML from the News article url and this sometimes crashes the BE

We can potentially be filling the context window, I didn't have a chance to implement document chunking

This whole thing is slow AF, I'd definitely precompute this with persistence on the BE with more time

Problems I encounterd

Figuring out how to get the content and entities into a format that worked well for rendering took longer than expected. Tbh, tried to see if ChatGPT could generate the algorithm in one go, but I ended up having to implement it by hand

juicebox-takehome's People

Contributors

alprielse avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.