Parses text and determines the language of the text based on character bigrams.
This Java project was done for my Concordia Artificial Intelligence course with Dr. Leila Kosseim. The following are the requirements:
- system must be able to read training corpora into bigrams
- system must be able to use bigrams to identify the language of a given sentence
The system..
- ignores punctuation
- considers all letters lower case
- removes diacritics
- is based on 2-character sequences