Repository to store helper functions for NLTK package for Natural Language Processing
These functions aid in data processing of dependency and constituency treebanks such as Exploratory Data Analysis and processing
count_pos_tags
Counts the different Parts-Of-Speech tags in a constituency treebank and returns a Pandas DataFrame
extract_constituents
Extracts the different constituents in a constituency treebank and returns a list. Additional preprocessing such as using a Counter object to count the items can be used.
nltk_tree_to_str
Extracts all the leaves in a constituency treebank and returns the sentence
remove_null_elements
Function to remove all -NONE-
tags in a constituency treebank. Note: constituency tag of the removed branches may not be syntactically correct after -NONE-
removal
swap_xpos_tags
Swaps out the UPOS tags to XPOS tags in a dependency treebank. This is in reference to LAL-HPSG for your custom dataset
The use of such functions are to aid in research on top of the current Natural Language Tool Kit (nltk) python package.