hanfei1986 Goto Github PK
Name: Fei Han
Type: User
Bio: Senior Principal Data Scientist | Machine Learning and Generative AI | Semiconductor Specialist | Physicist
Location: New Jersey
Name: Fei Han
Type: User
Bio: Senior Principal Data Scientist | Machine Learning and Generative AI | Semiconductor Specialist | Physicist
Location: New Jersey
Bar chart race is an elegant animation that depicts the progress of multiple categories over time. We can create them in Python.
When training data is bigger than memory, we can feed the training data to neural network training in multiple batches. This notebook demostrates how to do it and visualizes the training and test losses.
https://chatbot-v2.streamlit.app/
Calculating semiconductor chip yield against defect density using a Monte Carlo simulation is a common approach to assess the impact of defects on chip manufacturing. In this simulation, we'll randomly generate defect locations and evaluate chip yield based on specified criteria.
This is a CNN tutorial for beginners about a digits recognition model trained on the MNIST dataset. I built two models with TensorFlow/Keras and PyTorch/Skorch respectively.
Imbalanced data commonly exist in real world, especially in anamoly-detection tasks. Handling imbalanced data is important to the tasks, otherwise the predictions are biased towards the majority class. BalancedRandomForestClassifier can deal with the imbalanced data without knowing any novel techniques like SMOTE.
Scrape movie titles, release year, director, cast, rating, users rated, and href from https://www.imdb.com/chart/top/?ref_=nv_mv_250 using Python and Beautiful Soup.
This Jupyter notebook demonstrates a dimension reduction method by dropping high variance-inflation-factor (VIF) features recursively.
This notebook demonstrates the charts I usually plot for exploratory data analysis for classification tasks.
This notebook demonstrates the charts I usually plot for exploratory data analysis for regression tasks.
Monte Carlo simulation is a computational technique that uses random sampling and statistical methods to estimate the behavior of complex systems or solve problems. It is particularly useful when dealing with problems that involve a high degree of randomness or complexity.
Ydata_profiling is a library to help data scientists quickly review data and find information and patterns in the data. This Jupyter notebook shows an example of using ydata_profiling to do so.
BERT is an NLP model developed by Google Research in 2018, after its inception it has achieved state-of-the-art accuracy on several NLP tasks. This notebook demonstrates fine tuning BERT for sentiment analysis.
This is a "happy wife, happy life" project. My wife's work involves repetitive and tiresome file searches on her hard drive. To bring more joy and efficiency into her work life, I've developed an innovative solution. By utilizing its intuitive interface, my wife can swiftly locate the files she needs without the hassle of manual searching.
This Python program provides a high-throughput solution to trim whitespace margins in images.
A histogram of an image provides valuable insights into the distribution of pixel intensities within that image. This notebook gives a brief about how to plot the histogram. Furtherly, we can replot the picture with a heatmap based on its pixel intensities.
Hyperparameter tuning for LogisticRegression, KNeighborsClassifier, BaggingClassifier, ExtraTreesClassifier, XGBClassifier, and SVC.
This Jupyter notebook demonstrates tuning hyperparameters of machine learning models with total profit as a scoring metric to gain maximum total profit.
This Python program is used to pre-process images and recognize characters in them (OCR) with pytesseract in a batch-processing way.
When signaficant amount of data are missing, what can we do? Impute the missing data with mean or median? Actually, Scikit-Learn provides two powerful imputers, KNNImputer and IterativeImputer, which can do this work effectively.
When signaficant amount of data in highly-important features are missing, what can we do? Impute the missing data with mean or median? In this Juyter notebook, I demonstrate embedding a XGBoost model to do the data imputation in the data transformer.
Increase the density of data by interpolation.
ResNet models are lightweight computer vision pre-trained models. This notebook demostrates how to infer the object in a picture with ResNet18, ResNet34, ResNet50, ResNet101, and ResNet251.
SHAP is a fancy tool for interpreting feature importance in machine learning tasks. This Jupyter notebook gives a demonstration.
Linear regression model is widely used in industry for regression tasks as it is straightforward and easy to interpret. To capature non-linear patterns in data, polynomial features need to be added. However, high-degree polynomial features lead to overfitting. To solve the problem, regularizations can be added to the loss function.
With the python-pptx library, we can automate the updating of PowerPoint slides.
Matrix factorization is a class of collaborative filtering algorithms used in recommender systems. Matrix factorization algorithms work by decomposing the user-item interaction matrix into the product of two lower dimensionality rectangular matrices.
Monte Carlo integration is particularly useful when dealing with high-dimensional integrals or integrals over complex, irregularly shaped domains where traditional methods may be impractical. It's widely used in various fields, including physics, finance, and engineering, for solving problems involving numerical integration.
Tensorflow/Keras and Pytorch/Skorch models for multiclass classification, hyperparameter tuning, and model evaluation.
Keras and Starch provide us wrappers which simplify building neural network models. However, the wrappers sacrifice the flexibility of the models. In some scenarios like early stopping and batch reading, building pristine neural network models is still very useful.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.