GithubHelp home page GithubHelp logo

prmr / creco Goto Github PK

View Code? Open in Web Editor NEW
6.0 6.0 2.0 6.12 MB

Recommendation System for Consumer Products

License: Apache License 2.0

Groovy 0.52% Java 88.83% CSS 2.66% JavaScript 7.99%

creco's People

Contributors

asutcl avatar ceipher avatar enewe101 avatar forgues avatar mangalagb avatar mariamn avatar nishanthtgwda avatar prmr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

creco's Issues

Obsolete branch

The feature_selection branch is now obsolete. I will delete next week if no one is using.

Feature-sensitive ranking algorithm

Implement the algorithm that takes as input as list of feature and a category and outputs a ranked list of products. Autodetect attributetype: more is better, less is better, discrete.

[Buisness Logic] Feature Selection

Need to implement an object that takes a list of products and an equivalence class and outputs a set of features(ratings and specs) as well as a score for these features based on the inputs.

This information is then passed to the product sorter.

Clean up Data API

For all files of data and data.stubs:

  • Insert license blocks
  • Make checkstyle-compliant (except for stubs)
  • Decouple domain objects from stubs
  • Get rid of superfluous fields (they can be reintroduced on an as-needed basis)
  • Provide access to Product and Category objects directly from CRData, as opposed to ProductList and CategoryList.

Branch: Issue0022

Data API and Database testing

This issue will be used to write tests to ensure that the data API is performing as we expect it to and more importantly to make sure that database follows our assumptions.

Please feel free to suggest any tests you would like to see. Here is a list of tests I will be starting with:

-Jaccard Index
-Product lists (add,remove, count)
-Attribute lists (add,remove, count)
-AttributeStat values (min, max, Enum,count)
-Category (getProducts, getSpecs, getRatings)
-CategoryList
-TypedVal

Auto-complete for the search

Auto-complete for the search page with pre-populated values for the text field. Checks if the entered string is a substring of any of the category names present.

Integrate Lucene with the equivalence class search

  • Create a class which initializes an index for all equivalence classes (by collapsing all text data of all products under each equivalence class)
  • Create a method in this class to search through the index based on a string query.
  • Return a list of equivalence classes sorted from most to least relevance to the query.

Incorrect mapping for product names

We are having difficulty finding the human-readable product names. For example, in the laptop equivalence class, there is the product name "2344BKU (Black)". The product can be found here:

http://www.consumerreports.org/cro/electronics-computers/computers-internet/computers/laptop-ratings/model/overview/lenovo-2344bku-black-d80965.htm

This page contains "Product Name ThinkPad T430 2344BKU Notebook", which is the string we want, but we can't find it through the Product API. We should investigate what's the correct field in the JSON file and make sure it's loaded in the Product object.

Autocomplete Feature

A revamp of the autocomplete feature including:

  • Obtaining the list of phrases dynamically
  • Limiting the autocomplete to when at least m characters are entered (start with 2)
  • Limiting the number of autocompletions to the top n choices (start with 10)
  • Include all categories, products, and brands in the index.

Test Product Filter

Test for following :
Return of specified number of products.
Different cases of specified filters

Test AttributeExtractor

Test:

Sorting
Score Attribution
Handling empty feature list
Handling of heterogeneous features
Extracted feature type
Extracted default value

Features display

  • Fix bug in the display of Ratings.
  • Add tool tip description of each feature.
  • Clean code and use unified logging

Create test scenarios

Create a list of scenarios that describe one walk-through of the entire state space of Creco, from product search to feature-sensitive ranking. These can be documented in /doc/scenarios/

Errors in Master branch

I'm getting 32 errors when running the tests in the master branch.
Also 404 (Not Found) Error from JQuery in the browser console.

New UI Layout

  • Table to show categories
  • Table to show the features, with feature descriptions in tooltips and checkboxes
  • Link products to product description page.

Product Ranking

Ranking products inside the recommended list to be displayed.

Inputs :

  1. Equivalence class
  2. Lucene scores
  3. Feature selection results - contraints.
  4. Ratings for the objects

Output :
Recommended list of products with ranks.

Test the search classes

Should have unit tests for ProductSearch and CategorySearch, which cover basic search functionality, such as querying by exact string match, by string with typo, etc.

Product Ranking

Unhook the product search and rank the products alphabetically. Keep the product search code around.

Master Branch Error

The application builds but crashes on the index page for me from commit: 56c2086 onwards.

Can someone verify this. I reset my local branch to 6fbaca3
and the application runs.

Should be reset the master branch to this commit?

Maintenance

Cleaning up code for milestone 1
Adding tests for each component
Code engineering

"N/A" Typed Values

"N/A" or "NA" is sometimes used interchangeably in the code to specifiy that no value was available for this attribute. This can lead to errors.

TypedValue should also be cleaned of dead code.

Add equals and hashcode.

Clean up HTML

The html resources have broken references and/or refer to unused URLs. Some of the div tags still have id "demo", etc.

Download CSS and reference from project.

Create the Data API

Make an API for loading products into the following classes:

  • Category
  • Product
  • CategoryList
  • ProductList
  • CategoryReader --> reads local json file and builds the CategoryList and ProductList using the above classes

The classes in the product object graph should be immutable.

General Discussion

Should task artifacts be put in the gitignore (.gradle/1.8/taskArtifacts/*)? They seem to be generated with every build and eclipse asks you to commit every time. If I understand correctly these are the files the project provides to the outside world.

Create singleton spring beans

CRData, CategorySearch and ProductSearch classes should all become singleton beans.

Each of the three classes will implement its own interface, so that we can also create mock objects to test.

Merge TypeValue and AttributeValue

Can these classes be merged and the extra functionality of AttributeValue moved into TypedValue? In any case it would be preferable to use primitive types as early as possible, i.e., as soon as the typed value is internally converted.

Create DataPath class

Create and test a class to produce a local path to the CR data dump that is based on a local properties file.

[data] Product and Attribute sorting

I suggest we add a score (or ranking) field to the Products and Attributes, make them comparable and them override the compareTo method to sort them according this score or ranking.

This way any controller manipulating the items could simply update the scores and then easily sort the items accordingly.

Unify logging

All the logging in Creco should be done through slf4j.

  • All exceptions should be logged.
  • All calls to println should be replaced by log records.

Unify concept of category

Currently the data API has categories, franchises, equivalences classes, and subequivalences classes. I wonder if this is all needed. The distinction between category and equivalence class is difficult to manage. It seems only equivalence classes are used by the rest of the system. I would like to propose:

  • That instead of equivalence class and category, we simply collapse "equivalence classes" into a single category upon loading the data, and only work with the resulting categories.
  • To rename the franchise to root categories.

Any comments on this? If this is a feasible change I'll retag this issue as a maintenance issue for release 0.3

Feature display and tweaking

The second UI view of the system, where we will show a set of features (ratings & specs) for the user to tweak.

Test the Data API

Achieve at least line coverage of all the category and product processing in CR data. The tests can be added into TestCRData.

UI Skeleton

  1. Define the web model (e.g. ProductListVO)
  2. Define the services and controllers
  3. Hook up with the static html/javascript with thymeleaf

Integrate Lucene Product search with our own product classes

  • Create a method to initialize the Lucene index based on a list of Product objects.
  • Create a method to search the index and find product matches, and return a SearchResults object.
  • Create a ScoredProduct object, which contains an object and its corresponding Lucene score.
  • Create a SearchResults object which contains both an equivalence class and a list ScoredProducts (already sorted from highest to lowest score)

Remove the server code

From what I understand the code in creco.server is highly experimental and I don't see how it's used given that the data is loaded by CRData. I recommend that we remove this from the project altogether to clean the code base.

Refreshing the CRData can be done by a proper service, but this should be done on a separate branch after milestone 0.2

Document the architecture

  • Produce a diagram documenting the major work areas. Upload the diagram to the wiki.
  • Produce a state diagram documenting the basic usage scenario.

Redesign the data API

  • Make the data API serve a flat list of categories.
  • Turn categories and products into immutable classes.
  • Test the categories and the category builder. Achieve 100% line coverage.
  • Expose the Product page URL.
  • Remove AttributeStat

Session Management

Define a class structure responsible for storing the user session information. Based on our discussion today, this should probably contain the following (these are just suggestions):

  • ProductScore: this class should contain a product (pointer to the data object), as well as a few score fields. There are at least two scores: a luceneScore - which is part of the output of Lucene's processing of the user's initial text query - and recommenderScore - which is responsible for ranking the products based on the user's selection of preferred feature values. Note that in practice, these scores get written during different system activities.
  • AttributeScore: this class records the users indication 1) that she cares about a particular attribute and 2) what value she prefers. For example, if, when presented with the specefications for humidifiers, the user chooses to express the preference "color -> blue", this fact is captured and persisted here.
  • currentEquivalenceClass - the current class in which the user is browsing
    ... and probably more

Investigate what facilities Spring provides to store data in association to a user-session, and, design these session objects in a way that fits Spring's session paradigm.

Feature Selector

Implement a service that takes in a category id and a max number of n features and returns at most n features determined to distinguish the products in the category.

Document the architecture

Diagram the overall architecture and flow of activities. I might try to bust out some UML on this.

This should be a living document that represents the actual relationships that we are building. If something seems out of line with what you're building, it's a sign that a discussion needs to happen. Hopefully this will help keep us well-synced.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.