GithubHelp home page GithubHelp logo

portfolio's Introduction

My Project Portfolio

Stock Price Prediction project leveraging LSTM and BERT models to analyze context from daily news headlines for forecasting stock movements.

  • Aggregated and processed over 100,000 news articles, aligning them with historical stock data from the S&P 500.
  • Developed a baseline LSTM model using stock data, and an advanced hybrid model combining LSTM with BERT embeddings of news headlines.
  • Implemented a custom tokenization strategy to handle BERT's token limit, creating daily news embeddings that represents the context for each day.

Developed a logistic regression model to predict the one-year Probability of Default (PD) for prospective borrowers to enhance the bank's loan underwriting process.

  • Trained a logistic regression model on historical bank transaction data to forecast default probabilities, emphasizing model explainability.
  • Engineered financial features like liquidity, debt coverage, and profitability, and handled missing data through finance-based and median imputation.
  • Performed feature selection using univariate and multivariate analysis, and addressed multicollinearity using Variation Inflation Factor.
  • Implemented walk-forward analysis and calibration for model evaluation, achieving an AUC of 0.7761, significantly improving upon a baseline model (AUC of 0.701).

NLP project using Latent Dirichlet Allocation (LDA) to perform topic modeling on text data scraped from a news website, further refining the LDA topics to find nuanced groupings by incorporating K-means clustering and BERT model.

  • Webscraped and collected news articles and stored it in Amazon S3.
  • Performed topic modeling utilizing LDA model and improved upon baseline applying K-means Clustering and BERT.
  • Evaluated its performance using various metrics such as topic coherence and silhouette score.
  • Visualized the word importance per topic and summarized each articles using BART.

Big data project using PySpark to extract and transform large-scale music listening data stored in HDFS and build a collaborative-filter based recommender system

  • Developed a collaborative filtering recommender system using Alternating Least Squares (ALS) model
  • Evaluated the model against a popularity baseline model
  • Used Mean Average Precision at K (MAP@K) metric for performance assessment
  • The ALS model outperformed the popularity baseline model and resulted in a 16.7x improvement in MAP@100.

Data engineering project using Airflow to perform ETL process on Twitter data and executing tasks inside Docker containers

  • Executed data extraction utilizing Twitter API calls (Tweepy)
  • Transformed JSON data into CSV using Python scripts and libraries (Pandas, JSON)
  • Loaded the processed data into AWS S3 buckets for storage using Python SDK (boto3)
  • Orchestrated the ETL process by implementing Airflow and operated the tasks inside Docker to ensure a controlled and isolated environment, resulting in improved development and testing processes as well as facilitating faster deployment.

Analysis of health insurance premium data using causal inference and machine learning methods

  • Conducted power analysis and hypothesis testing to identify relationships between features
  • Compared LASSO regression, Ridge regression, and Elastic net regression to identify the regularized regression model that will perform best prediction
  • Performed PCA, K-Mean's clustering and XGBoost classifier to create the decision tree to determine one’s diabetes status

Analysis of dummy bank customer data from United Kingdom

  • Customizable balance band and age band, interactive filters upon clicking figures
  • Segmented the customer distributions according to their age, bank balance, location, and gender
  • Created an interactive dashboard to derive business insights

Exploratory data analysis on research grant data retrieved from National Science Foundation

  • Transformed XML formatted non-relational data into a tabular relational data format
  • Pre-processed the data by removing any stopwords and missing data
  • Performed text analysis using libraries (such as PlaintextCorpusReader and BigramCollocationFinder) to segment the text into single word, two-worded and three-worded phrases and counting its frequencies
  • Used various machine learning algorithms (KMeans Clustering, XGBoost, Random Forest, NetworkX) to draw insights

Analysis of crime incidents occurring at New York City (NYC) parks to promote for safer parks. You can customize the county and view the parks along with their crime report incidents.

  • Conducted data pre-processing after importing a PDF data file
  • Combined the PDF data with spatial files using inner join (on park names)
  • Designed an attention-grabbing visualisation which can be used to motivate for more police patrols in NYC public parks.

portfolio's People

Contributors

choijin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.