GithubHelp home page GithubHelp logo

Hi there 👋

About me 💬

  • My name is Raluca and I am a passionate Data Scientist with 3.5 years of experience in reinsurance industry, working as a Senior Risk/ Quantitative Analyst and building solution to transform vast data landscapes into compelling business insights.
  • With strong background in machine learning and data modeling (NLP and (un)supervised ML), I am proficient with an array of tools for data processing and analysis: Jupyter, Python, AirFlow, AWS, MSSQL/Postgresql, Git .
  • I descriebe myself as a creative, self-starter with a keen mind for solving tough problems with adaptive, data-driven and automated solutions.
  • I have a PhD in Political Science from Trinity College Dublin, where I conducted a survey-experiment in Kenya and used advanced statistical models (panel models, multilevel model) to assess the political impact of Chinese economic engagement in Africa.

Skills 💻

  • Data gathering (e.g. Python - requests, bs4/BeautifulSoup, boto3, pyodbc; survey-experiment; SQL - MySQL, Redshift, data lake such as Dremio; Excel - power query)
  • Data processing (EDA using Python - matplotlib, pandas, numpy, seaborn; Excel - formulas, pivot)
  • Modelling (R - lme4, ordinal, panelAR, plm, Python - Scikit-learn, scipy, nltk, Tyche)
  • Data visualization (PowerBI, Tableau, R and Python - matplotlib, seaborn)
  • Orchestrating complex data pipelines (using Aiflow)
  • Cloud infrastructure (AWS - S3, EC2, Redshift)

Languages and software that I know and/or use:

Python seaborn statsmodels Scikit-learn Pandas NumPy Keras

postgresql mysql sql

airflow AWS metabase mongodb jenkins Git VSCode

Projects 🚀

This section contains my projects which span a variety of data science topics and utilize different libraries/tools.
The wordcloud below summarizes the most frequent words used to describe my projects. It gets generated on each change to the readme using github actions. Feel free to checkout wordcloud-readme repo for more details about the generation process.

Wordcloud

Feel free to explore the links below to learn more about each project!



A. Complex projects - original ideas developed into end-to-end pipelines and employing a combination of several tools (e.g. python, spark, SQL)



🔥Hot project🔥: Cyber Attacks Analysis

Cyber Attacks

  • New Project project that leverages the University of Maryland CISSM Cyber Attacks Database
  • Aim: to create an end-to-end data engineering pipeline to ingest, process, store, and visualize data on cybers attacks around the world between 2014 and 2023.
  • The project will be submitted as a capstone project for the Data Engineering Zoomcamp 2024
  • Expectations: employing cloud, IaC, workflow orchestration, data warehouse, and visualization tools to build a dashboard that shows the trends and patterns of cyber events breaches across different sectors and regions.
  • Hard deadline: 1st of April 2024






TED talks NLP recommendation system

Keywords: Unstructured data, NLP, scikit-learn, sentiment analysis, similarity, Streamlit

Cohort: WaiPRACTICE September Cohort 2023 by Women in AI Ireland (WAI) (github page).

Summary: Built a content-based recommendation system using NLP techniques, sentiment analysis and similarity measures.

Key steps:


Next Steps:


Valheim's Steam user reviews analysis

Keywords: API requests, NLP, sentiment analysis, unstructured data

Summary: A project that aims to analyze user reviews about the game Valheim on Steam to understand why a game with such low quality graphics has a great reception from players.

Key steps:

  • Reviews collections using Steam public API.
  • Sentiment analysis using a pre-trained BERT transformer.
  • EDA process uncovered user exploits
  • Python libraries: pandas, numpy, matplotlib and seaborn

Next steps:


Root cause analysis for defects in production (root cause analysis, decision tree, neural networks)

Root cause analysis for defects in production

Keywords: supervised ML, decision tree, random forest

Cohort: Women in Data Science Accelerator 2020 (Accenture)

Summary: Conducted a root cause analysis to predict defects in production using decision tree model

Key steps:

  • Sofiware: R, Python, Tableau
  • Libraries: RPART, Boruta, Scikit-learn , Graphviz, dtreeviz

Next steps:




B. ML bits and Pieces - smaller scale projects, usually comprised of one notebook or one script, meant to focus on a specific tool or ML aspect



Find movies' similarity (NLP, KMeans/Clustering, Unsupervised Learning)

ML bits and Pieces

Keywords: Movie Similarity, NLP, KMeans, Cosine Similarity, Clustering, Unsupervised Learning

Summary: an NLP project endeavor that quantifies the similarities between movies based on their IMDb and Wikipedia plots. It aims to provide insights into movie relationships and group them into meaningful clusters.

Key Steps:

  • Data Preprocessing using NLP techniques, such as Tokenization, Stemming and TF-IDF Vectorization
  • Performed unsupervised learning with KMeans by first determine optimal clusters using the elbow method and assign movies to clusters.
  • Used Cosine Similarity to measure similarity distances between movie plots.

Next Steps:

  • Explore additional features (e.g., genre, director) for improved clustering.
  • Visualize clusters and explore movie recommendations within each cluster.

Hotel Bookings (SVM, classification, decision boundaries)

ML bits and Pieces

Keywords: support vector machine, classification, feature engineering, hyperparameter tuning

Summary: The project aims to predict whether a hotel booking will be canceled or not, using a support vector machine (SVM) classifier, using a data set containing information about the lead time, average daily rate, number of weekend nights, arrival week number of each booking etc.

Key Steps:

  • Preprocessing the data by scaling the numerical features and creating new binary and interactive features
  • Selecting the most informative features based on mutual information scores
  • Tuning the SVM hyperparameters using grid search cross-validation
  • Evaluating the best model on the test set and plotting the decision boundaries for different kernels

Next Steps:

  • Compare the performance of the SVM classifier with other machine learning models, such as logistic regression, decision tree, or random forest
  • Explore the effect of different feature selection methods, such as chi-square test, ANOVA, or recursive feature elimination
  • Analyze the factors that influence the cancelation probability and provide recommendations to reduce it
  • Deploy the model as a web application or a dashboard that can interact with real-time data

Predicting crops based on soil metrics (neural network, tensorflow, keras, random forest classifier)

ML bits and Pieces

Keywords: crop type prediction, soil metrics, tensorflow, keras, scikit-learn, logistic regression, random forest classifier, neural network.

Summary: This project predicts the best crop type for a soil sample based on four soil metrics: N, P, K, and pH, using Logistic Regression, Random Forest Classifier and Neural Network

Key Steps:

  • explores three machine learning algorithms: logistic regression, random forest, and neural network from tensorflow
  • evaluates the model's performance using metrics such as F1-score and confusion matrix from scikit-learn

Next Steps: Collect more data from different regions and seasons to validate the model on new data.



ML bits and Pieces

Keywords: Structured data, SQL, pandas, sqlite3, Data Analysis, Data Manipulation

Summary: Built a data exploration project using SQL techniques to analyze data from BusinessFinancing.co.uk on the world’s oldest businesses. The project involved creating a SQLite database, loading data from CSV files, and running SQL queries to gain insights into these historic businesses.

Key Steps:

  • Created a SQLite database and defined the schema using SQL commands.
  • Loaded data from CSV files into the database using pandas.
  • Ran SQL queries to merge and manipulate the data, and used pandas to analyze the results.
  • Libraries: sqlite3, pandas

Next Steps:

  • Enhance the project by incorporating more datasets related to businesses.
  • Explore the use of more advanced SQL techniques for further data analysis.
  • Consider visualizing the results using a library like matplotlib or seaborn.


PhD thesis and older projects:

  • Political Impact of Chinese Economic Engagement in Africa: PhD thesis - project that involved conducting a survey-experiment in Kenya and using advanced statistical models (e.g., multilevel, ordinal logistic, panel data model) to provide an in-depth assessment of the political impact of Chinese economic engagement in Africa.
  • Profiling electoral candidates: My first NLP project that involved using quanteda package and doing a content analysis of a 2016 presidential debate of US Democratic Party’s candidates.

Achievements 🏆

Some of the achievements that I have accomplished are:

  • Graduated with a PhD in Political Scienece from Trinity College Dublin.
  • Completed Accenture’s “Women in Data Science Accelerator”.
  • Won the Irish Research Council Government of Ireland Postgraduate Scholarship, a highly competitive and prestigious research grant with an average success rate of 18% and a total amount of €48,000.

My GitHub Streak Top Langs

Contact 📫

If you want to reach out to me, you can find me on:

 

Fun facts 🎉

Some fun facts about me are:

  • I am originally from Transilvania
  • I am enjoy eating garlic
  • I speak several languages: English, French, Spanish and Japanese.
  • I love traveling and exploring new places.

Raluca Nicoara's Projects

data-projects icon data-projects

This repository contains examples of data projects I did or I am currently working on, using R and Python

glassdoor_reviews icon glassdoor_reviews

This repository includes a Data science project about collecting and analyzing Glassdoor reviews

mlbitsandpieces icon mlbitsandpieces

A collection of small-scale projects exploring the fascinating world of Machine Learning and Artificial Intelligence. Each project in this repository represents a step towards understanding and applying ML/AI concepts in real-world scenarios.

python_final_project icon python_final_project

This repository includes the documents for the final team project for the beginner Python course 2019 (Code First Girls). The team was composed of Raluca Nicoara and Alina Ciobanu.

ralucan.github.io icon ralucan.github.io

Use this template if you need a quick developer / data science portfolio! Based on a Minimal Jekyll theme for GitHub Pages.

steam_reviews icon steam_reviews

This project aims to analyze user reviews about the game Valheim on Steam

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.