GithubHelp home page GithubHelp logo

odd2023---datascience---ex-02's Introduction

Ex02-Outlier

You are given bhp.csv which contains property prices in the city of banglore, India. You need to examine price_per_sqft column and do following,

(1) Remove outliers using IQR

(2) After removing outliers in step 1, you get a new dataframe.

(3) use zscore of 3 to remove outliers. This is quite similar to IQR and you will get exact same result

(4) for the data set height_weight.csv find the following

(i) Using IQR detect weight outliers and print them

(ii) Using IQR, detect height outliers and print them

Aim:

TO detect and remove the outliers in the given data set and save the final data.

Algorithm:

Step 1 Import the required packages(pandas,numpy,scipy)

Step 2 Read the given csv file

Step 3 Convert the file into a dataframe and get information of the data.

Step 4 Remove the non numerical data columns using drop() method.

Step 5 Detect the outliers in the data set using z scores method.

Step 6 Remove the outliers by z scores and list manupilation or by using Interquartile Range(IQR)

Step 7 Check if the outliersare removed from data set using graphical methods.

Step 8 Save the final data set into the file.

Program:

FOR BHP.CSV FILE

import pandas as pd
df=pd.read_csv("/bhp.csv")
df
df.info()
df.shape
import seaborn as sns
sns.boxplot(x="price_per_sqft",data=df)
Q1=df['price_per_sqft'].quantile(0.25)
Q3=df['price_per_sqft'].quantile(0.75)
IQR=Q3-Q1
lower=Q1-1.5*IQR
upper=Q3+1.5*IQR
newdata=df[(df['price_per_sqft']>=lower) & (df['price_per_sqft']<=upper)]
print(newdata)
newdata=df[(df['price_per_sqft']>=lower) | (df['price_per_sqft']<=upper)]
print(newdata)
newdata.shape
sns.boxplot(x="price_per_sqft",data=newdata)
z_score=np.abs(stats.zscore(df['price_per_sqft']))
newdata2=df[(z_score<3)]
print(newdata2)
outlier2=df[(z_score>=3)]
print(outlier2)
newdata2.shape
sns.boxplot(x="price_per_sqft",data=newdata2)

FOR HEIGHT_WEIGHT.CSV FILE

import pandas as pd
df=pd.read_csv("/height_weight.csv")
df
df.info()
df.shape
df.describe()
import seaborn as sns
sns.boxplot(x="height",data=df)
Q1=df['height'].quantile(0.25)
Q3=df['height'].quantile(0.75)
IQR=Q3-Q1
lower=Q1-1.5*IQR
upper=Q3+1.5*IQR
newdata1=df[(df['height']>=lower) | (df['height']<=upper)]
print(newdata1)
newdata=df[(df['height']>=lower) & (df['height']<=upper)]
print(newdata)
sns.boxplot(x='height',data=newdata1)

OUTPUT:

Original data for bhp.csv file:

image

Dataset information:

image

Shape of a data:

image

Box Plot of price_per_sqft column without outliers:

image

Removing the outlier for price_per_sqft column by using IRQ :

image

image

Box Plot of price_per_sqft column after IRQ:

image

Removing the outlier for price_per_sqft column by using zscore of 3 :

image

image

Box Plot of price_per_sqft column after zscore of 3:

image

Original data of height_weight.csv file for height:

image

Dataset information:

image

Shape of a dataset:

image

Box plot of height column without outlier:

image

Removing the outlier for height column by using IRQ :

image

image

Box Plot of height column after IRQ:

image

Original data of height_weight.csv file for weight:

image

Dataset information:

image

Shape of a dataset:

image

Box plot weight column without outlier:

image

Removing the outlier for weight column by using IRQ :

image

image

Box Plot of height column after IRQ:

image

Result:

Thus the outliers are detected and removed in the given file and the final data set is saved into the file.

odd2023---datascience---ex-02's People

Contributors

karthi-govindharaju avatar yuvaranithulasingam avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.