pgdemiranda,Pablo Miranda,github

Hello there,

My name is Pablo Gomes de Miranda.

I am actively seeking professional opportunities as a Data Scientist, with a particular interest in roles where I can utilize data to help companies make informed decisions that drive positive outcomes.

While my educational background includes a bachelor's, master's, and PhD in different fields of Humanities, I am currently focusing on expanding my knowledge and skills in various tools used in Data Science. I am actively working on projects to build a portfolio that showcases my abilities.

I believe that my extensive experience in Education and History has equipped me with strong communication skills and the ability to offer unique solutions in the field of Data Science.

Data Science Projects:

Insiders

This is a clustering project where we worked on segmenting customers for DataSmart, a fictitious e-commerce, with the purpose of creating a loyalty program called Insiders. The planned segmentation followed an RFM logic, where Recency can be considered as the time since the last purchase and the responsibility of our customers, Frequency as the time between transactions and their engagement on the platform, and Monetary as the total revenue and which high-value purchases were made. Using the data available on Kaggle, we carried out an end-to-end project with deployment on AWS, where we elected a cluster of 86 customers with an average gross revenue of US$4179.93.

Tools used:

Python 3.10.10;
VS Code;
Jupyter Notebook;
YData-Profiling;
Metabase;
SQL: SQLite and PostgreSQL;
Git and Github;
Amazon Web Services: S3, RDS and EC2.

Health Insurance Cross Sell

This is a Learning to Rank (LTR) project in which the objective is to classify and rank clients interested in purchasing vehicle insurance. The company SafeHarbor Insurance is a fictitious insurance company made up by us, in order to provide a business context for our problem. The data have been acquired in the challenge Health Insurance Cross Sell Prediction from Kaggle, We perform an exploratory data analysis, train different classification Machine Learning models, evaluate the metrics, and test their results.

Tools used:

Python 3.10.10;
VS Code;
Jupyter Notebook;
PostgreSQL;
Git and Github;
Render Cloud;
Flask;
Google Sheets Apps Script.

Cardiovascular Disease Detection

This is a Classification project where we were hired to develop a model that could help a medical company detect the onset of cardiovascular diseases among patients. Medical data was collected from Kaggle, and in the end, we achieved a classification model that, in the worst-case scenario, with 72% precision, could bring a return of US$ 175,000,000.00, and in the best-case scenario, with 78% precision, a profit of US$ 210,000,000.00 could be expected.

Tools used:

Python 3.10.8;
VS Code;
Jupyter Notebook;
Git and Github;

Sales Forecasting

This is a Regression problem for a sales Forecasting in which we propose the sales prediction of a European pharmaceutical company, Dirk Rossmann GmbH. The data was collected via Kaggle from the Rossmann Store Sales competition. After an exploratory data analysis and the use of an algorithm called boruta to select the best features for a prediction using a XGBoost Regressor Machine Learning model, we have achieved an average prediction of sales of €285,338,016.00 for the next six weeks and implemented the solution in a way that is easily accessible for the company's business team.

Tools used:

Python 3.9.13
VS Code
Jupyter Notebook
Heroku: Cloud Application Platform
Telegram Messenger

Data Analysis:

Insights Project

This is an exploratory data analysis (EDA) project whose objectives are to generate insights to answer two simple questions asked by a fictitious real estate company: given a list of properties:

which ones should be acquired and
what are the sales conditions to obtain the highest profit.

Tools used:

Python 3.9.13
VS Code
Jupyter Notebook
Streamlit
Streamlit Community Cloud

We answered both questions by delivering two csv files containing a list of 157 properties that can be acquired at a reasonable price by the company and sold in different seasons making a good profit. If House Rocket acquire and sell all the suggested properties, it can be expected a total profit of US$24222890.20

Data Manipulation

This is an exercise to understand the basics of Python, practice data manipulation, and also have a grip on the libraries and packages of this programming language. We also exercised code versioning, both in local and remote repositories. The goal was to produce a list of motorcycles, according to a series of specifications, that could be purchased by a company with the purpose of obtaining profit from their resale.

Tools Used:

Python 3.10.8;
VS Code;
Jupyter Notebook;
Git and Github;
Streamlit Cloud.

Data Professional Survey Breakdown

Simple dashboard using Microsoft Power BI to demonstrate my data manipulation skills and ability to prepare dashboards with the appropriate tools. The data used was collected from a real survey conducted by a YouTube channel.

Tools Used:

Microsoft Power BI;
Microsoft Excel;
Github.

pgdemiranda Goto Github PK

Hello there,

Data Science Projects:

Data Analysis:

You can reach me through my e-Mail or LinkedIn

Pablo Miranda's Projects

Recommend Projects

Recommend Topics

Recommend Org

Jobs