Data-Analysis-and-Data-Visualization

Sample Space Event

Sample space event refers to specific outcomes or occurrences that can happen within the sample space of an experiment or trial. Example:

Probability

Probability in data analytics is a powerful tool for dealing with uncertainty, modeling variability in data, and making informed decisions based on available information. It forms the basis for statistical reasoning, hypothesis testing, and various machine learning techniques. Example:

Measure of Central Tendency

• Mean: The sum of all values in a dataset divided by the number of values.It provides a measure of the central location and is sensitive to extreme values.

• Median: If there's an even number of observations, it's the average of the two middle values.

• Mode: The mode is the value that occurs most frequently in a dataset.

• Range: The range is the difference between the maximum and minimum values in a dataset.

• Variance: Variance measures how far each data point in a dataset is from the mean. It is the average of the squared differences from the mean. It quantifies the overall variability in the dataset.

• Standard Deviation: The standard deviation is the square root of the variance. It provides a measure of how spread out the values are around the mean. It is a widely used indicator of the amount of variation or dispersion in a dataset.

• Quartile: Quartiles divide a dataset into four equal parts. The first quartile (Q1) is the value below which 25% of the data falls, the second quartile (Q2) is the median, and the third quartile (Q3) is the value below which 75% of the data falls. Quartiles are useful for understanding the distribution and identifying potential outliers.

Exploratory Data Analysis Methods

Correlation Analysis

Correlation analysis, often computed using the corr() function, is a statistical method used to measure the strength and direction of a linear relationship between two variables.

Anomalies

Anomalies, in the context of data analysis, refer to values or patterns that deviate significantly from the expected or typical behavior of the data.

Outliers

Outliers are data points that stand out from the rest of the dataset because they are significantly higher or lower than most of the other values.

Hypothesis Testing

Hypothesis testing is a statistical method used to make inferences or draw conclusions about unobserved data.

Paired Sample T-test

It's a univariate test typically applied to data like before-and-after blood pressure measurements for individuals.

H₀: means the difference between the two samples is 0
H₁: means the difference between the two sample is not 0

Simple Linear Regression

Simple Linear Regression is a powerful tool for understanding and predicting how one variable influences another.

EDA

Preprocessing modeling

Splitting Training and Test Set

Fitting Into Training

Predict The Result

Plot The Result

Evaluate Model

Matplotlib and Seaborn

Line Chart

A line chart is a type of data visualization that displays data points over a continuous interval or time span and connects these points with straight lines. It is particularly useful for showing trends, patterns, and changes in data over time.

MATPLOTLIB

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

SEABORN

Seaborn is a popular data visualization library built on top of Matplotlib in Python. Seaborn simplifies the process of creating complex visualizations, such as statistical plots, heatmaps, pair plots, and more, with less code.Seaborn complements Matplotlib and is often used for data analysis and exploration due to its simplified and informative plotting capabilities.

Bar Graph

When we have categorical data then we can use bar graph to visualize
Bar charts are also usually chosen to highlight increases or decreases over a period of time.

Using Matplotlib

Using seaborn

• Bar charts are used to view the frequency of categorical data. • Bar charts in Seaborn use the barplot() function.

Load dataset from seaborn

Seaborn library provides a barplot function that can automatically compute averages.

Histogram

• The histogram visualization is also bar-shaped. • However, a histogram is a type of graph that explains frequencies based on two numerical data. Load dataset from seaborn

bin—sometimes called a class interval—is a way of sorting data in a histogram.

Pro Tips

• Histograms are commonly used to view the frequency of continuous numerical data. • Bar charts are commonly used to view the frequency of categorical data.

Changing the size of bins using Numpy

Adjusting bin size using NumPy allows you to control the width of the intervals in a histogram, providing a more detailed or coarse view of your data. Smaller bin sizes capture finer patterns, while larger ones smooth out the data's distribution.

pritaaa / data-analytics-and-data-visualization Goto Github PK

data-analytics-and-data-visualization's Introduction

Data-Analysis-and-Data-Visualization

Sample Space Event

Probability

Measure of Central Tendency

Exploratory Data Analysis Methods

Correlation Analysis

Anomalies

Outliers

Hypothesis Testing

Paired Sample T-test

Simple Linear Regression

EDA

Preprocessing modeling

Splitting Training and Test Set

Fitting Into Training

Predict The Result

Plot The Result

Evaluate Model

Matplotlib and Seaborn

Line Chart

MATPLOTLIB

SEABORN

Bar Graph

Using Matplotlib

Using seaborn

Load dataset from seaborn

Histogram

Pro Tips

Changing the size of bins using Numpy

Using Matplotlib

Using Seaborn

Multiple Histogram

Recommend Projects

Recommend Topics

Recommend Org

Jobs