Create a Python programme to tokenize the following using the NLTK toolkit:
a) word
b) sentence
c) remove stop words & punctuation and list the words
Note: Take the input as “What is Web Mining? Web Mining is the process of ‘’Data Mining” techniques, and extract information from Web documents and services. The main purpose of web mining is discovering useful information from the World-Wide Web and it’s usage patterns.”
Write a python program to scrape website to extract the following:
a) raw HTML content
b) tags (title, p, a, div)
c) all textual content.
Note: Consider the input to be any website of your choice.
Create a Python programme that performs Elias Gamma Encoding and Decoding for even numbers ranging from 1 to 20.
Create a Python programme that uses TF-IDF to find the important words in the given corpus.
Note: Collect strings from the following documents and create a corpus containing strings from documents d1, d2, and d3.
• d1: VIT Vellore University
• d2: VIT
• d3: Web
Create a Python programme that performs Elias Delta Encoding and Decoding for a given number.
Create a Python programme to implement the Page Rank Algorithm in order to plot a graph and print the page rank for each page.
Create a Python programme that uses the Networkx Module to implement the Hyperlink Induced Topic Search (HITS) Algorithm and prints the Hub and Authority scores.
Create a Python programme to implement the decision tree and prints the accuracy percentage, pression, recall and the predicted values.
Create a Python programme that uses the K-means clustering algorithm and displays all clusters in different colours.