GithubHelp home page GithubHelp logo

data-analytics-and-data-visualization's Introduction

Data-Analysis-and-Data-Visualization

Sample Space Event

Sample space event refers to specific outcomes or occurrences that can happen within the sample space of an experiment or trial. Example:

image

Probability

Probability in data analytics is a powerful tool for dealing with uncertainty, modeling variability in data, and making informed decisions based on available information. It forms the basis for statistical reasoning, hypothesis testing, and various machine learning techniques. Example:

image

Measure of Central Tendency

• Mean: The sum of all values in a dataset divided by the number of values.It provides a measure of the central location and is sensitive to extreme values.

• Median: If there's an even number of observations, it's the average of the two middle values.

• Mode: The mode is the value that occurs most frequently in a dataset.

• Range: The range is the difference between the maximum and minimum values in a dataset.

• Variance: Variance measures how far each data point in a dataset is from the mean. It is the average of the squared differences from the mean. It quantifies the overall variability in the dataset.

• Standard Deviation: The standard deviation is the square root of the variance. It provides a measure of how spread out the values are around the mean. It is a widely used indicator of the amount of variation or dispersion in a dataset.

• Quartile: Quartiles divide a dataset into four equal parts. The first quartile (Q1) is the value below which 25% of the data falls, the second quartile (Q2) is the median, and the third quartile (Q3) is the value below which 75% of the data falls. Quartiles are useful for understanding the distribution and identifying potential outliers.

Exploratory Data Analysis Methods

Correlation Analysis

Correlation analysis, often computed using the corr() function, is a statistical method used to measure the strength and direction of a linear relationship between two variables.

image

Anomalies

Anomalies, in the context of data analysis, refer to values or patterns that deviate significantly from the expected or typical behavior of the data.

image

Outliers

Outliers are data points that stand out from the rest of the dataset because they are significantly higher or lower than most of the other values.

image

image

Hypothesis Testing

Hypothesis testing is a statistical method used to make inferences or draw conclusions about unobserved data.

Paired Sample T-test

It's a univariate test typically applied to data like before-and-after blood pressure measurements for individuals.

  • H₀: means the difference between the two samples is 0
  • H₁: means the difference between the two sample is not 0

image

Simple Linear Regression

Simple Linear Regression is a powerful tool for understanding and predicting how one variable influences another.

EDA

image

image

image

Preprocessing modeling

image

Splitting Training and Test Set

image

Fitting Into Training

image

Predict The Result

image

Plot The Result

image

Evaluate Model

image

Matplotlib and Seaborn

Line Chart

A line chart is a type of data visualization that displays data points over a continuous interval or time span and connects these points with straight lines. It is particularly useful for showing trends, patterns, and changes in data over time.

MATPLOTLIB

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

image

SEABORN

Seaborn is a popular data visualization library built on top of Matplotlib in Python. Seaborn simplifies the process of creating complex visualizations, such as statistical plots, heatmaps, pair plots, and more, with less code.Seaborn complements Matplotlib and is often used for data analysis and exploration due to its simplified and informative plotting capabilities.

image

Bar Graph

  • When we have categorical data then we can use bar graph to visualize
  • Bar charts are also usually chosen to highlight increases or decreases over a period of time.

Using Matplotlib

image

image

image

Using seaborn

• Bar charts are used to view the frequency of categorical data. • Bar charts in Seaborn use the barplot() function.

Load dataset from seaborn

image

Seaborn library provides a barplot function that can automatically compute averages.

image

image

Histogram

• The histogram visualization is also bar-shaped. • However, a histogram is a type of graph that explains frequencies based on two numerical data. Load dataset from seaborn

image

image

image

image

bin—sometimes called a class interval—is a way of sorting data in a histogram.

image

image

image

Pro Tips

• Histograms are commonly used to view the frequency of continuous numerical data. • Bar charts are commonly used to view the frequency of categorical data.

Changing the size of bins using Numpy

Adjusting bin size using NumPy allows you to control the width of the intervals in a histogram, providing a more detailed or coarse view of your data. Smaller bin sizes capture finer patterns, while larger ones smooth out the data's distribution.

Using Matplotlib

image

image

image

image

Using Seaborn

image

image

image

Multiple Histogram

image

data-analytics-and-data-visualization's People

Contributors

pritaaa avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.