GithubHelp home page GithubHelp logo

astra-ai-demo-central-group-'s Introduction

astra-ai-demo

Multilingual Text Similarity with Vector Search

Demonstrate Datastax Astra's Vector search with Text similarity search using Retail eCommerce product Dataset

Demo UI

https://github.com/krishnannarayanaswamy/astra-ai-demo/blob/main/demoui.png

Some guidance below on how you can learn to do text similarity with Vector search

This repository includes 2 sections

  1. Backend
  • Generate Vector embeddings for product dataset in language other than english
  • Load Vector embeddings into Astra
  • Exposes an API to perform Vector search and retrieve similar products
  1. Frontend
  • Chat GPT like interface built in react to search items by context for similar items

1. Backend

The [loadDataEmbed.py] python code creates the embeddings using a Multilingual model and stores in Astra VectorDB. Refer to Cohere Multi Lingual Model for how to generate embeddings for text in other language other than English. The dimensions of the multilingual embeddings is 768 dimensions.

The code also creates required tables and the indexes in Astra

The sample Dataset includes 52,000 products.

Setup to run the backend

Review the Astra Getting started guide, if needed.

Create a new vector search enabled database in Astra. astra.datastax.com

For the easy path, name the keyspace in that database with the name, as required.

Create a token with permissions to create tables Download your secure-connect-bundle zip file. Set up an open.ai API account and generate a key Set up an cohere API account and generate a key Create an .env file with the below keys and update the Environment Variables cell

    openai_api_key = "<open api key>"
    cass_user = '<client id from the astra token>'
    cass_pw = '<client password from the astra token>'
    scb_path = '<path to secure connect bundle>'x`
    keyspace='<your keyspace>'
    table='<your table>'
    data_file='<your dataset>'
    coherekey='<cohere key>'

If you are changing your dataset review the below code to create the table and indexes and modify the columns appopriately

    session.execute(f"""CREATE TABLE IF NOT EXISTS {keyspace}.{table_name}
    (product_id int,
    chunk_id int,
    title text,
    description text,
    link text,
    imagelink text,
    availability text,   
    price text,
    brand text,
    condition text,
    producttype text,
    saleprice text,                                              
    openai_description_embedding vector<float, 768>,
    minilm_description_embedding vector<float, 384>,
    PRIMARY KEY (product_id,chunk_id))""")

    # # Create Index
    session.execute(f"""CREATE CUSTOM INDEX IF NOT EXISTS openai_desc ON {keyspace}.{table_name} (openai_description_embedding) USING 'org.apache.cassandra.index.sai.StorageAttachedIndex'""")
    session.execute(f"""CREATE CUSTOM INDEX IF NOT EXISTS minilm_desc ON {keyspace}.{table_name} (minilm_description_embedding) USING 'org.apache.cassandra.index.sai.StorageAttachedIndex'""")
    session.execute(f"""CREATE CUSTOM INDEX IF NOT EXISTS title_index ON {keyspace}.{table_name} (title) USING 'org.apache.cassandra.index.sai.StorageAttachedIndex'""")

Run the below command to generate embeddings and load data

python3 loadDataEmbed.py

Generate Vector embedding and Load data

Run the below command and it should pick up the dataset in csv file ,create table, generate embeddings for the product description and price in the dataset and store in AstraDB

python3 similaritysearch.py

Similarity Search API

The API in [similaritysearch.py] queries that table and uses the results to give ChatGPT some context to support it's response. The source sample database is mostly consumer brick and mortar products. Here we use the same cohere API that we used to calculate embeddings for each row in the database, but this time we are using your input question to calculate a vector to use in a query. The query vector has the same dimensions (number of entries in the list) as the embeddings we generated for each row in the database. We fetch the top 5 results using ANN Similarity and build a prompt with which we'll query ChatGPT. Note the "roles" in this little conversation give the LLM more context about who that part of the conversation is coming from.

The codes uses a Dataset in thai language. If you prefer to use in other language, modify Line 112 the below code.

thai_response=translator.translate(human_readable_response, dest='th')

API Info

Request

POST /similaritems
Content-type: application/json

{"newQuestion":"คำแนะนำเกี่ยวกับการออกแบบห้องน้ำหลักใหม่ด้วยสไตล์โมเดิร์นและอุปกรณ์ที่เข้าชุดกัน"}

Query

SELECT *
    FROM {keyspace}.{table_name}
    ORDER BY openai_description_embedding ANN OF {embeddings} LIMIT 5"

Frontend

In App.js

Replace Line 38 with your API Server URL

const response = await axios.post('http://localhost:9000/similaritems', { newQuestion });

For the first time

npm install

and start the frontent app using

npm start

Credits And References

The react app was generated using Create React App

The Chat GPT clone was built by with this reference

The Multi lingual model code was built using this reference

The google translation python code was built using this reference

The python backend code was built using this colab notebook from Kiyu Gabriel as reference

astra-ai-demo-central-group-'s People

Contributors

krishnannarayanaswamy avatar

Stargazers

benzativit avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.