I'm a Data Scientist currently working at Naviga AI, where I specialize in developing sophisticated chatbots and customized multi-tool agents using LangChain and LangGraph frameworks. My work focuses on optimizing Retrieval-Augmented Generation (RAG), developing secure and efficient chatbot tools, and performing back-end development with FastAPI. My repositories showcase a range of projects, many of which leverage my expertise in machine learning, predictive modeling, and optimization.
Here are a few notable projects I have worked on:
-
TalkYou: TalkYou is an innovative open-source project designed to enable users to have a chat with any YouTube video. It brings user a customized chatbot experience, not only with the ability to chat but also with an amazing feature of image retrieval based on user queries. Both LangChain and LangGraph frameworks were utilized as the back-bone of this project to achieve conditional tool calling capabilities.
-
LLMRoboFund: A financial assistant chatbot equipped with a customized multi-tool agent for Text-to-SQL queries and retrieval-augmented generation (RAG). Optimized for efficient and accurate investment research.
-
Forecasting Hourly Electricity Prices: Developed a suite of predictive models, including a Long Short-Term Memory (LSTM) neural network for time-series forecasting and an XGBoost regression model for predicting t+1 forecast windows. These models provide accurate hourly electricity price predictions.
-
Econ Dashboard: Created a comprehensive financial dashboard that consolidates financial and economic data for analysis, sentiment classification, and time-series forecasting. The dashboard integrates custom-built neural network models, including a sentiment classifier with pre-trained embeddings and LSTM models tailored to different market capitalizations.
-
Financial Sentiment Classifier: Developed a natural language processing (NLP) model to assess the sentiment of financial and economic texts, such as commentary, tweets, and news articles. The model is built on the Universal Sentence Encoder (USE) and fine-tuned using the Financial Phrasebank's 'agreeall' dataset for accurate sentiment classification in the financial domain.
I occasionally write blog posts on various topics at Medium. Here are some of my recent articles:
-
Leveraging Lagged Exogenous Variables For Time-Series Forecasting β Without Time: Explore additional usage of ML models to forecast (t+n) horizons with lagged exogenous variables. A neural network time-series model is also deployed to create a benchmark for later comparison, using nn.LSTM layers.
-
House Price Prediction: Stochastic Gradient Boosting With KNN Imputer for Pre-processing: A depth-analysis on Kaggle's House price prediction competition, with a focus on the usage of KNNImputer for missing value imputations, feature engineering, model building using Stochastic Gradient Boost, hyperparameter tuning with Optuna, and SHAP-based feature selection.
-
Credit Score Prediction With Multi-Model Ensemble Voting Classifier: A detailed process starting from data cleaning and interpolation to building a voting-classifier as an introduction to building a classification model, using various machine learning algorithms.