prmr / creco Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 2.0 6.12 MB

Recommendation System for Consumer Products

License: Apache License 2.0

Groovy 0.52% Java 88.83% CSS 2.66% JavaScript 7.99%

creco's People

Contributors

Stargazers

Watchers

Forkers

mariamn nguyenviethien

creco's Issues

Obsolete branch

The feature_selection branch is now obsolete. I will delete next week if no one is using.

Feature-sensitive ranking algorithm

Implement the algorithm that takes as input as list of feature and a category and outputs a ranked list of products. Autodetect attributetype: more is better, less is better, discrete.

[Buisness Logic] Feature Selection

Need to implement an object that takes a list of products and an equivalence class and outputs a set of features(ratings and specs) as well as a score for these features based on the inputs.

This information is then passed to the product sorter.

Clean up Data API

For all files of data and data.stubs:

Insert license blocks
Make checkstyle-compliant (except for stubs)
Decouple domain objects from stubs
Get rid of superfluous fields (they can be reintroduced on an as-needed basis)
Provide access to Product and Category objects directly from CRData, as opposed to ProductList and CategoryList.

Branch: Issue0022

Data API and Database testing

This issue will be used to write tests to ensure that the data API is performing as we expect it to and more importantly to make sure that database follows our assumptions.

Please feel free to suggest any tests you would like to see. Here is a list of tests I will be starting with:

-Jaccard Index
-Product lists (add,remove, count)
-Attribute lists (add,remove, count)
-AttributeStat values (min, max, Enum,count)
-Category (getProducts, getSpecs, getRatings)
-CategoryList
-TypedVal

Auto-complete for the search

Auto-complete for the search page with pre-populated values for the text field. Checks if the entered string is a substring of any of the category names present.

Integrate Lucene with the equivalence class search

Create a class which initializes an index for all equivalence classes (by collapsing all text data of all products under each equivalence class)
Create a method in this class to search through the index based on a string query.
Return a list of equivalence classes sorted from most to least relevance to the query.

Incorrect mapping for product names

We are having difficulty finding the human-readable product names. For example, in the laptop equivalence class, there is the product name "2344BKU (Black)". The product can be found here:

http://www.consumerreports.org/cro/electronics-computers/computers-internet/computers/laptop-ratings/model/overview/lenovo-2344bku-black-d80965.htm

This page contains "Product Name ThinkPad T430 2344BKU Notebook", which is the string we want, but we can't find it through the Product API. We should investigate what's the correct field in the JSON file and make sure it's loaded in the Product object.

Autocomplete Feature

A revamp of the autocomplete feature including:

Obtaining the list of phrases dynamically
Limiting the autocomplete to when at least m characters are entered (start with 2)
Limiting the number of autocompletions to the top n choices (start with 10)
Include all categories, products, and brands in the index.

Test Product Filter

Test for following :
Return of specified number of products.
Different cases of specified filters

Aggregate Specifications and Rankings

Given that the purpose of Creco is to simplify honing on products, we should aggregate specs and ratings as a uniform list of attributes.

Test AttributeExtractor

Test:

Sorting
Score Attribution
Handling empty feature list
Handling of heterogeneous features
Extracted feature type
Extracted default value

Features display

Fix bug in the display of Ratings.
Add tool tip description of each feature.
Clean code and use unified logging

Empty categories in product search results

Search for "Humidifier"

The results show "Humidifier for your baby" with 0 products.

We shouldn't return empty categories.

Create test scenarios

Create a list of scenarios that describe one walk-through of the entire state space of Creco, from product search to feature-sensitive ranking. These can be documented in /doc/scenarios/

Errors in Master branch

I'm getting 32 errors when running the tests in the master branch.
Also 404 (Not Found) Error from JQuery in the browser console.

Dynamic update of product ranking

update product ranking based on user's selected features

New UI Layout

Table to show categories
Table to show the features, with feature descriptions in tooltips and checkboxes
Link products to product description page.

Product Ranking

Ranking products inside the recommended list to be displayed.

Inputs :

Equivalence class
Lucene scores
Feature selection results - contraints.
Ratings for the objects

Output :
Recommended list of products with ranks.

Test the search classes

Should have unit tests for ProductSearch and CategorySearch, which cover basic search functionality, such as querying by exact string match, by string with typo, etc.

Product Ranking

Unhook the product search and rank the products alphabetically. Keep the product search code around.

Master Branch Error

The application builds but crashes on the index page for me from commit: 56c2086 onwards.

Can someone verify this. I reset my local branch to 6fbaca3
and the application runs.

Should be reset the master branch to this commit?

Maintenance

Cleaning up code for milestone 1
Adding tests for each component
Code engineering

"N/A" Typed Values

"N/A" or "NA" is sometimes used interchangeably in the code to specifiy that no value was available for this attribute. This can lead to errors.

TypedValue should also be cleaned of dead code.

Add equals and hashcode.

Clean up HTML

The html resources have broken references and/or refer to unused URLs. Some of the div tags still have id "demo", etc.

Download CSS and reference from project.

Create the Data API

Make an API for loading products into the following classes:

Category
Product
CategoryList
ProductList
CategoryReader --> reads local json file and builds the CategoryList and ProductList using the above classes

The classes in the product object graph should be immutable.

General Discussion

Should task artifacts be put in the gitignore (.gradle/1.8/taskArtifacts/*)? They seem to be generated with every build and eclipse asks you to commit every time. If I understand correctly these are the files the project provides to the outside world.

Generate file headers with copyright notice

Setup some tooling to automatically generate file headers with license and copyright information.

Main search ui and result products display

Creating the .html view files for both Search UI and the returned products display.
Match the data object with the front-end model by controllers

Create singleton spring beans

CRData, CategorySearch and ProductSearch classes should all become singleton beans.

Each of the three classes will implement its own interface, so that we can also create mock objects to test.

Merge TypeValue and AttributeValue

Can these classes be merged and the extra functionality of AttributeValue moved into TypedValue? In any case it would be preferable to use primitive types as early as possible, i.e., as soon as the typed value is internally converted.

Create DataPath class

Create and test a class to produce a local path to the CR data dump that is based on a local properties file.

[data] Product and Attribute sorting

I suggest we add a score (or ranking) field to the Products and Attributes, make them comparable and them override the compareTo method to sort them according this score or ranking.

This way any controller manipulating the items could simply update the scores and then easily sort the items accordingly.

Unify logging

All the logging in Creco should be done through slf4j.

All exceptions should be logged.
All calls to println should be replaced by log records.

Unify concept of category

Currently the data API has categories, franchises, equivalences classes, and subequivalences classes. I wonder if this is all needed. The distinction between category and equivalence class is difficult to manage. It seems only equivalence classes are used by the rest of the system. I would like to propose:

That instead of equivalence class and category, we simply collapse "equivalence classes" into a single category upon loading the data, and only work with the resulting categories.
To rename the franchise to root categories.

Any comments on this? If this is a feasible change I'll retag this issue as a maintenance issue for release 0.3

Feature display and tweaking

The second UI view of the system, where we will show a set of features (ratings & specs) for the user to tweak.

Test the Data API

Achieve at least line coverage of all the category and product processing in CR data. The tests can be added into TestCRData.

Accents don't show up properly in the Category List

Search for "Sparking Ros" and you should get a category intended to be "Sparkingly Rosé" that's got a messed up acute accent.

UI Skeleton

Define the web model (e.g. ProductListVO)
Define the services and controllers
Hook up with the static html/javascript with thymeleaf

Create a feature specification file

Create a file listing all the features of Creco.

Integrate Lucene Product search with our own product classes

Create a method to initialize the Lucene index based on a list of Product objects.
Create a method to search the index and find product matches, and return a SearchResults object.
Create a ScoredProduct object, which contains an object and its corresponding Lucene score.
Create a SearchResults object which contains both an equivalence class and a list ScoredProducts (already sorted from highest to lowest score)

Remove the server code

From what I understand the code in creco.server is highly experimental and I don't see how it's used given that the data is loaded by CRData. I recommend that we remove this from the project altogether to clean the code base.

Refreshing the CRData can be done by a proper service, but this should be done on a separate branch after milestone 0.2

Document the architecture

Produce a diagram documenting the major work areas. Upload the diagram to the wiki.
Produce a state diagram documenting the basic usage scenario.

[Interface] Define interface for business logic

Search Service
...
more to be added later.

Persistence (Startup - Retrieval of information from consumer database and validation )

At the start of application, the system automatically checks from the local data path if, the local json files is stale (not pulled from consumer API in the last 24 hours) and pulls the new set of files.

Product/Category objects are built after the data is persisted.

Redesign the data API

Make the data API serve a flat list of categories.
Turn categories and products into immutable classes.
Test the categories and the category builder. Achieve 100% line coverage.
Expose the Product page URL.
Remove AttributeStat

Session Management

Define a class structure responsible for storing the user session information. Based on our discussion today, this should probably contain the following (these are just suggestions):

ProductScore: this class should contain a product (pointer to the data object), as well as a few score fields. There are at least two scores: a luceneScore - which is part of the output of Lucene's processing of the user's initial text query - and recommenderScore - which is responsible for ranking the products based on the user's selection of preferred feature values. Note that in practice, these scores get written during different system activities.
AttributeScore: this class records the users indication 1) that she cares about a particular attribute and 2) what value she prefers. For example, if, when presented with the specefications for humidifiers, the user chooses to express the preference "color -> blue", this fact is captured and persisted here.
currentEquivalenceClass - the current class in which the user is browsing
... and probably more

Investigate what facilities Spring provides to store data in association to a user-session, and, design these session objects in a way that fits Spring's session paradigm.

prmr / creco Goto Github PK

creco's People

Contributors

Stargazers

Watchers

Forkers

creco's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs