Netflix Movies and TV Shows project dataset is from this link - https://www.kaggle.com/datasets/shivamb/netflix-shows
Create a conda environment: conda create --name netflix conda activate netflix
conda create -n env-01 python=3.9 scipy=0.15.0 numpy exmaple
pip install --upgrade seaborn matplotlib
- which movie has the highest country viewing/releases - Done
- Which actors are most likely to work together?
- what type of content is added over months like holiday season (december , july, january) and the quantity of released content over the months
- which countries have the largest quantity of released content , group this by content type . What are the most common genres in the top 5 countries ? . visualise the type of content produced by countries
- explore the “Age” of content on Netflix, which means the gap between when movies/shows are released and when they are added.
- see how 11 varies per country
- Find out more on the movie and tv rating , visualise TV vs Movies and group them based on the targeted audience eg : kids, young adult , teenagers, adults
- Visualise 13 based on countries
- Movie and TV Show Genres, quantity of content released (in the genre)
- group 15 based on type(content)
- Netflix Titles
- Netflix Description
- spliting the date_added column (second link)
Data cleaning
Data Exploration
- How do the variables correlate?
- what type of content have they been focussing on over the years?
- Movie and TV Show Duration
- What are the top 10 genres on Netflix ?
- Find out more on the movie and tv rating and Group them based on the targeted audience eg : kids, young adult , teenagers, adults Data Visualisation
- Which countries have contributed most movies in recent years?
- what is the content release at netflix like ?
- what is the distribution of Netflix’s content by origin, or country ?
- what type of content have they been focussing on over the years?
Data Preparation After downloading the dataset, I load the dataset into a dataframe for the data cleaning process
-
Fill in the NaN values from the dataset Making sure there arent any NULL value in our data to make the data consistent. culumns with null value include :
- rating
- date_added
- director
- cast
- country
- duration
-
Deleting redundant columns.
- Handling invalid values on the date_added column, some values in the date_added column are greater than those in the release_year column (i.e. the year the movie was added is earlier than that it was released)
- drop such invalid values to ensure data accuracy
-
Dropping duplicates.
-
Cleaning individual columns.
- https://www.analyticsvidhya.com/blog/2021/07/visualizing-netflix-data-using-python/
- https://jovian.com/janecww415/netflix-movies-and-shows-analysis
- https://www.kaggle.com/code/thiagopanini/insights-from-netflix-the-show-must-go-on/notebook ***
- https://www.dataquest.io/blog/comical-data-visualization-in-python-using-matplotlib/ ***
- https://www.kaggle.com/code/nikunjmalpani/netflix-movies-and-tv-shows-data-visualization
- https://jobymathew97.medium.com/netflix-movies-and-tv-shows-data-visualization-using-matplotlib-f1b4e91b5226
- https://www.nomidl.com/python/netflix-data-analysis-project-using-python/
- https://github.com/nataliafonseca/netflix-data-analysis/blob/main/notebook.ipynb
- https://app.datacamp.com/workspace/w/cc0a1d5f-0b59-4555-bc66-87d9dd3d5e96 ***
- https://medium.com/@linhvu.nt/data-analysis-and-recommendations-on-netflix-content-28707163553a ***
- https://jovian.com/astha1998/netflix-data-analysis-project