Hello! This is a brief introduction for this api tool. This tool could be used for streamlining your NLP process.
This is a function that could preprocess your str. It could:
- Lowercase all words
- Remove URLs
- Tokenize the string into individual words
- Remove punctuations
- Remove stopwords
This function will return a string after done all of these works.
To use it:
import requests
response = requests.get('https://nlpstreamline-api-5b8e6f695807.herokuapp.com/basic_text_processing', params={'raw_str': "Good morning! I'm a student from the University of Toronto. I'm studying computer science."})
print(response.text)
The 'raw_str' is the str you want to process. It will return a string.
This function will process your raw str to stemmed and lemmatized words list. To use it:
import requests
response = requests.get('https://nlpstreamline-api-5b8e6f695807.herokuapp.com/nlp_text_processing', params={'raw_str': "Good morning! I'm a student from the University of Toronto. I'm studying computer science."})
print(response.text)
It will return a list which contains all words after stemmed and lemmatized. The stemming method is Snowball Stemmer, and the Lemmatizer is called WordNet.
This function could generate n grams.
To use it:
import requests
# Return 2 grams and their frequency
response = requests.get('https://nlpstreamline-api-5b8e6f695807.herokuapp.com/n_gram_generation', params={'raw_str': "Good morning! I'm a student from the University of Toronto. I'm studying computer science.", 'n': 2})
print(response.text)
It will return a dictionary containing all n grams ranked from the highest frequency to the lowest one. n must be a positive number.
Github Repo: https://github.com/CielZ001/nlpstreamline-api