My name is Pablo Gomes de Miranda.
I am actively seeking professional opportunities as a Data Scientist, with a particular interest in roles where I can utilize data to help companies make informed decisions that drive positive outcomes.
While my educational background includes a bachelor's, master's, and PhD in different fields of Humanities, I am currently focusing on expanding my knowledge and skills in various tools used in Data Science. I am actively working on projects to build a portfolio that showcases my abilities.
I believe that my extensive experience in Education and History has equipped me with strong communication skills and the ability to offer unique solutions in the field of Data Science.
This is a clustering project where we worked on segmenting customers for DataSmart, a fictitious e-commerce, with the purpose of creating a loyalty program called Insiders. The planned segmentation followed an RFM logic, where Recency can be considered as the time since the last purchase and the responsibility of our customers, Frequency as the time between transactions and their engagement on the platform, and Monetary as the total revenue and which high-value purchases were made. Using the data available on Kaggle, we carried out an end-to-end project with deployment on AWS, where we elected a cluster of 86 customers with an average gross revenue of US$4179.93.
Tools used:
- Python 3.10.10;
- VS Code;
- Jupyter Notebook;
- YData-Profiling;
- Metabase;
- SQL: SQLite and PostgreSQL;
- Git and Github;
- Amazon Web Services: S3, RDS and EC2.
This is a Learning to Rank (LTR) project in which the objective is to classify and rank clients interested in purchasing vehicle insurance. The company SafeHarbor Insurance is a fictitious insurance company made up by us, in order to provide a business context for our problem. The data have been acquired in the challenge Health Insurance Cross Sell Prediction from Kaggle, We perform an exploratory data analysis, train different classification Machine Learning models, evaluate the metrics, and test their results.
Tools used:
- Python 3.10.10;
- VS Code;
- Jupyter Notebook;
- PostgreSQL;
- Git and Github;
- Render Cloud;
- Flask;
- Google Sheets Apps Script.
This is a Classification project where we were hired to develop a model that could help a medical company detect the onset of cardiovascular diseases among patients. Medical data was collected from Kaggle, and in the end, we achieved a classification model that, in the worst-case scenario, with 72% precision, could bring a return of US$ 175,000,000.00, and in the best-case scenario, with 78% precision, a profit of US$ 210,000,000.00 could be expected.
Tools used:
- Python 3.10.8;
- VS Code;
- Jupyter Notebook;
- Git and Github;
This is a Regression problem for a sales Forecasting in which we propose the sales prediction of a European pharmaceutical company, Dirk Rossmann GmbH. The data was collected via Kaggle from the Rossmann Store Sales competition. After an exploratory data analysis and the use of an algorithm called boruta to select the best features for a prediction using a XGBoost Regressor Machine Learning model, we have achieved an average prediction of sales of €285,338,016.00 for the next six weeks and implemented the solution in a way that is easily accessible for the company's business team.
Tools used:
- Python 3.9.13
- VS Code
- Jupyter Notebook
- Heroku: Cloud Application Platform
- Telegram Messenger
This is an exploratory data analysis (EDA) project whose objectives are to generate insights to answer two simple questions asked by a fictitious real estate company: given a list of properties:
- which ones should be acquired and
- what are the sales conditions to obtain the highest profit.
Tools used:
- Python 3.9.13
- VS Code
- Jupyter Notebook
- Streamlit
- Streamlit Community Cloud
We answered both questions by delivering two csv files containing a list of 157 properties that can be acquired at a reasonable price by the company and sold in different seasons making a good profit. If House Rocket acquire and sell all the suggested properties, it can be expected a total profit of US$24222890.20
This is an exercise to understand the basics of Python, practice data manipulation, and also have a grip on the libraries and packages of this programming language. We also exercised code versioning, both in local and remote repositories. The goal was to produce a list of motorcycles, according to a series of specifications, that could be purchased by a company with the purpose of obtaining profit from their resale.
Tools Used:
- Python 3.10.8;
- VS Code;
- Jupyter Notebook;
- Git and Github;
- Streamlit Cloud.
Simple dashboard using Microsoft Power BI to demonstrate my data manipulation skills and ability to prepare dashboards with the appropriate tools. The data used was collected from a real survey conducted by a YouTube channel.
Tools Used:
- Microsoft Power BI;
- Microsoft Excel;
- Github.