GithubHelp home page GithubHelp logo

Hi there πŸ‘‹ I'm Louis

INTRO VIDEO

Screen Shot 2023-02-08 at 11 23 20 AM

I am a storyteller who is passionate about data, writing, public speaking and building community. I hold a special place in my heart for Natural Language Processing (spaCy, NLTK,) and inferential modeling (statsmodels, scikit-learn, and using SHAP or LIME explainers.) Additionally, I have experience with data wrangling, scraping, and engineering, (pandas, selenium,) and using machine learning algorithms (decision tree/Random Forest, Linear and Logistic regression analysis.) I also adore making visualizations (seaborn, matplotlib,) and using maps (folium.) I have experience using APIs (Google, Yelp, Twitter) as well as using public data sets (NYC Open Portal.) A firm believer in a well-written code comment, accessible documentation, and better coding with functions.

Projects

Pumpkin_Spice (Time series analysis and Modeling)

ReadMe

Main Code Notebook

Time series analysis with naive, SARIMA and ETS modeling.
  • Google trends last 5 years of data with the "pumpkin spice" search term
  • Performed both additive and multiplicative time series decomposition of data with explinations
  • Tested naive models that use last year and last week's data as today's direct prediction
  • Tested SARIMA and ETS itterations of models with terms based on both ACF and PACF plots, as well as an itterative approach with PMD Auto Arima
  • Best model was an ETS model with MAPE: 12.118428034881385, RMSE: 4.429939957414803 which is roughly 2-3 times better than using last year's data
  • Downloaded final model and will verify last year's data in one year for verification blindness.

HurriHelp (Natural Language Processing Task and Modeling)

ReadMe

Main Code Notebook

Natural Language Processing project using sentiment analysis to help find Hurricane Ian survivors in distress, and provide them with links to National Disaster Distress Helpline and FEMA Info.
  • Scraped Twitter for over seven thousand original tweets
  • Used three different sentiment analysis analyzers (TextBlob, VADER, and a distillBERT model) to find the sentiment of tweets
  • Used analysis to explore the most common words in negative and positive sentiments.
  • Used Random Forest, Naive Bayes, XGBoost, and Catboost on vectorized tweets for 80% precision in the best-tuned model
  • Used Lime Text Explainer for the inferential understanding of the most common words in data, and found words most likely to be in negative tweets

Fetal Health Project (Catigorization Task and Modleing)

ReadMe

Main Code Notebook

Health information project for early detection and faster diagnosis of pathological fetal heartbeats.
  • Built data pipelines and created two early detection algorithms ready for A/B testing
  • Tested Logistic Regression, K-Nearest-Neighbors, Support Vector Machine, and Random Forest for the highest recall of pathological class of 92%
  • Used OneVsRest wrapper and GridSearchCV for model specialized to pathological class and best-tuned machine.
  • Used Shap Values for the inferential understanding of the model, and reduced dimensionality of the algorithm by five input features for 90% recall

Seattle Section 8 Expansion Project (Regression Task and Modeling)

ReadMe

Code Notebook

Built inferential linear regression model to inform the city of Seattle on where to build new public housing.
  • Performed in-depth EDA, determining the effect of renovations over time on property values for efficient budgeting.
  • Used Google’s geocoding API and selenium to engineer features: how close each property is to the closest public school, hospital, and police station for infrastructure insights related to property values
  • Visualized multicollinearity in seaborn with correlation heat maps, for feature selection
  • Iteratively constructed eight versions of the linear regression model, removing features with low P Values for an R-Squared score of 83% Produced automated findings report for human analysis of coefficients, including inferential analysis of inequity by zip code

Film Analysis Project (Exploritory Data Analysis)

ReadMe

Code Notebook

Advised hypothetical Microsoft Studios on what kinds of feature films to produce by analyzing box office financial data.
  • Aggregated data using Pandas and visualized in Seaborn to discover the most popular genres by net profit: Animation, Adventure, Fantasy and Family films
  • Pivoted Table to aggregate films of most popular genres by release date to infer seasonal trends: best months for releasing new films in most popular genres to be June, July, May, March, November, and April
  • Inferred best run time for new films is between 120-150 minutes based on historical popularity

Stop And Frisk Project

Tableau Dashboard,

ReadMe

Code Notebook

Tableau Public dashboard using demographic information from 2021 for public transparency.

Geospacial Vizualization

ReadMe

First Code Notebook

In-depth analysis, static geocoding visualization of consent-asked for rates in 2020 for public transparency.
  • Visualized frequencies of stops and consent-asked in 2020 NYPD stop and frisk data towards understanding inequities in policing.
  • Investigated anomalies and outliers, patterns, and relationships for insights into different zip codes and communicate results with choropleth maps and graphs made in Folium and Seaborn
  • Resulting artical Using Folium On Police Data published by Towards Data Science

LINKS

Louis Casanave's Projects

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.