GithubHelp home page GithubHelp logo

assignment10_end's Introduction

1. Precision , Recall and F1 score

To understand these metrics, we will take the heart disease dataset. Here we have to predict if the patient is suffering from heart ailment or not.

Confusion matrix helps us to gain an insight into how correct our predictions were and how they hold up against the actual values.

image

1 : Patients having heart disease. 0 : Patients who don’t have heart disease.

Actual Values :

  1. The patients who actually don’t have heart disease : 40
  2. The patients who actually have a heart disease : 51

Predicted Values :

  1. The patients who were predicted as not having a heart disease : 41
  2. The patients who were predicted having a heart disease : 50

True Positives : Patients who are having heart disease and model also predicted them having heart disease. False Positives : Patients who are not having heart disease but model predicted them having heart disease. False Negatives : Patients who are having heart disease but model predicted them not having heart disease. True Negatives : Patients who are not having heart disease and model also predicted them not having heart disease.

Precision is the ratio between the true positives and all the predicted positives. Basically, it is tells you what proportion of positive identificiations were actually correct.

image

True positive (TP) = 43 False Positive (FP) = 7 Predicted Positives = 50

Precision = True positive / True Positive + False positive = 43 / 50 = 0.86

Recall is the ratio between the true positives and all the actual positives. Basically, it tells you what proportion of actual positives were identified.

image

True Positives (TP) = 43 False Negatives (TN) = 8 Actual Positives = 51

Recall = True Positives / False Negative + True Positive = 43 / 51 = 0.84

F1 score is a function of precision and recall. F1 score is needed when you want to have balance between Recall and Precision.

There are lot of situations where both precision and recall are equally important, in such cases we uses f1- score. F1-score is harmonic mean of recall and precision.

image

F1 score = 2 * ((0.86 * 0.84) / (0.86 + 0.84)) = 0.85

2. BLEU

Bilingual evaluation understudy is an algorithm for evaluating the quality of text which has been machine translated from one natural language to another.

The closer a machine translation is to a professional human translation, the better it is : BLEU: a Method for Automatic Evaluation of Machine Translation by Kishore Papineni

To measure the machine translation effectiveness, we will evaluate the closeness of the machine translation to human reference translation using a metric known as BLEU-Bilingual Evaluation Understudy.

BLEU compares the n-gram of the candidate translation with n-gram of the reference translation to count the number of matches. The more the number of matches between candidate and reference and translation, the better is machine translation. BLEU can be computed as a ratio of covered candidate words out of total no of candidate words. BLEU = Covered / Total no of words in the candidate.

3. Perplexity It is a metric used essentially for language models. Assuming that a language model is a probability matrix between a word and the next word that occurs in the corpus of the training set, Perplexity, known as PP, is “the inverse probability of the test set, normalised by the number of words”. In the Perplexity equation below, there are N words in a sentence, and each word is represented as w, where P is the probability of each w after the previous one. Also, we can expand the probability of W using the chain rule as followed.

image

Given the equation above, the more accurate the language model, the higher the probability of the word sequence, the lower the perplexity. In other words, we try to minimise the value of PP(W) to get a better model.

4. Bert Score It is a automatic evaluation metric for text generation. It computes similarity score. Instead of finding an exact match, the contextual embedding of each token in the candidate sentence is compared with the embeddings of all tokens in the reference sentence. The embeddings are compared based on cosine similarity.

image

assignment10_end's People

Contributors

priyasharma2427 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.