hanfei1986 Goto Github PK

followers: 8.0 following: 9.0 repos: 45.0 gists: 0.0

Name: Fei Han

Type: User

Bio: Senior Principal Data Scientist | Machine Learning and Generative AI | Semiconductor Specialist | Physicist

Location: New Jersey

Fei Han's Projects

bar-chart-race

Bar chart race is an elegant animation that depicts the progress of multiple categories over time. We can create them in Python.

batch-reading-of-neural-network-training-and-visualization-of-loss

When training data is bigger than memory, we can feed the training data to neural network training in multiple batches. This notebook demostrates how to do it and visualizes the training and test losses.

build-a-chatbot-powered-by-gpt-3.5-using-streamlit

https://chatbot-v2.streamlit.app/

calculate-semiconductor-chip-yield-against-defect-density-with-monte-carlo-simulation

Calculating semiconductor chip yield against defect density using a Monte Carlo simulation is a common approach to assess the impact of defects on chip manufacturing. In this simulation, we'll randomly generate defect locations and evaluate chip yield based on specified criteria.

cnn-for-digits-recognition

This is a CNN tutorial for beginners about a digits recognition model trained on the MNIST dataset. I built two models with TensorFlow/Keras and PyTorch/Skorch respectively.

comparison-between-randomforestclassifier-and-balancedrandomforestclassifier

Imbalanced data commonly exist in real world, especially in anamoly-detection tasks. Handling imbalanced data is important to the tasks, otherwise the predictions are biased towards the majority class. BalancedRandomForestClassifier can deal with the imbalanced data without knowing any novel techniques like SMOTE.

data-crawler-for-imdb-top-250-movies

Scrape movie titles, release year, director, cast, rating, users rated, and href from https://www.imdb.com/chart/top/?ref_=nv_mv_250 using Python and Beautiful Soup.

dimension-reduction-by-dropping-high-vif-features-recursively

This Jupyter notebook demonstrates a dimension reduction method by dropping high variance-inflation-factor (VIF) features recursively.

eda-plots-for-classification

This notebook demonstrates the charts I usually plot for exploratory data analysis for classification tasks.

eda-plots-for-regression

This notebook demonstrates the charts I usually plot for exploratory data analysis for regression tasks.

estimate-the-area-of-a-region-using-a-monte-carlo-simulation

Monte Carlo simulation is a computational technique that uses random sampling and statistical methods to estimate the behavior of complex systems or solve problems. It is particularly useful when dealing with problems that involve a high degree of randomness or complexity.

exploratory-data-analysis-using-ydata_profiling

Ydata_profiling is a library to help data scientists quickly review data and find information and patterns in the data. This Jupyter notebook shows an example of using ydata_profiling to do so.

fine-tune-bert-for-sentiment-analysis

BERT is an NLP model developed by Google Research in 2018, after its inception it has achieved state-of-the-art accuracy on several NLP tasks. This notebook demonstrates fine tuning BERT for sentiment analysis.

high-throughput-file-search-engine

This is a "happy wife, happy life" project. My wife's work involves repetitive and tiresome file searches on her hard drive. To bring more joy and efficiency into her work life, I've developed an innovative solution. By utilizing its intuitive interface, my wife can swiftly locate the files she needs without the hassle of manual searching.

high-throughput-whitespace-trimmer-for-images

This Python program provides a high-throughput solution to trim whitespace margins in images.

histogram-of-an-image-and-its-heatmap

A histogram of an image provides valuable insights into the distribution of pixel intensities within that image. This notebook gives a brief about how to plot the histogram. Furtherly, we can replot the picture with a heatmap based on its pixel intensities.

hyperparameter-tuning-for-logisticregression-knn-bagging-extratrees-xgboost-svm

Hyperparameter tuning for LogisticRegression, KNeighborsClassifier, BaggingClassifier, ExtraTreesClassifier, XGBClassifier, and SVC.

hyperparameter-tuning-with-a-custom-scoring-metric

This Jupyter notebook demonstrates tuning hyperparameters of machine learning models with total profit as a scoring metric to gain maximum total profit.

image-processing-and-optical-character-recognition-with-tesseract

This Python program is used to pre-process images and recognize characters in them (OCR) with pytesseract in a batch-processing way.

impute-missing-data-with-knnimputer-and-iterativeimputer

When signaficant amount of data are missing, what can we do? Impute the missing data with mean or median? Actually, Scikit-Learn provides two powerful imputers, KNNImputer and IterativeImputer, which can do this work effectively.

impute-missing-data-with-xgboost

When signaficant amount of data in highly-important features are missing, what can we do? Impute the missing data with mean or median? In this Juyter notebook, I demonstrate embedding a XGBoost model to do the data imputation in the data transformer.

increase-the-density-of-data-by-interpolation

Increase the density of data by interpolation.

inference-of-a-picture-with-resnet-models

ResNet models are lightweight computer vision pre-trained models. This notebook demostrates how to infer the object in a picture with ResNet18, ResNet34, ResNet50, ResNet101, and ResNet251.

interpret-feature-importance-using-shap

SHAP is a fancy tool for interpreting feature importance in machine learning tasks. This Jupyter notebook gives a demonstration.

linear-regression-and-its-regularizations

Linear regression model is widely used in industry for regression tasks as it is straightforward and easy to interpret. To capature non-linear patterns in data, polynomial features need to be added. However, high-degree polynomial features lead to overfitting. To solve the problem, regularizations can be added to the loss function.

make-powerpoint-slides-with-python

With the python-pptx library, we can automate the updating of PowerPoint slides.

matrix-factorization-with-svd-nmf-and-gradient-descent

Matrix factorization is a class of collaborative filtering algorithms used in recommender systems. Matrix factorization algorithms work by decomposing the user-item interaction matrix into the product of two lower dimensionality rectangular matrices.

monte-carlo-integration

Monte Carlo integration is particularly useful when dealing with high-dimensional integrals or integrals over complex, irregularly shaped domains where traditional methods may be impractical. It's widely used in various fields, including physics, finance, and engineering, for solving problems involving numerical integration.

neural-network-models-for-multiclass-classcification

Tensorflow/Keras and Pytorch/Skorch models for multiclass classification, hyperparameter tuning, and model evaluation.

neural-network-models-without-using-wrappers

Keras and Starch provide us wrappers which simplify building neural network models. However, the wrappers sacrifice the flexibility of the models. In some scenarios like early stopping and batch reading, building pristine neural network models is still very useful.

hanfei1986 Goto Github PK

Fei Han's Projects

Recommend Projects

Recommend Topics

Recommend Org

Jobs