simple-regex-engine-in-c's Introduction

Simple-Regex-Engine-in-C

This code prints the starting and ending index of the match in the text, in case of a match; 0 otherwisw.

The following matches required for this assignment are supported by this code:-

Matching of individual characters and numbers
Ranges: a-z,A-Z,0-9 or anything in between
Character classes
Macros like +,*,?
\w,\d
Greedy and non-greedy matching in case of *

Certain Additional matches are also supported since they weren't very difficult to implement, like

Matches '.' - Everything except \n
Support for greedy and non greedy wrt +
Startswith: ^ and Endswith: $
Non inclusion matches in character class: [^...]
\W,\D,\s,\S

Data Structures used:- The input pattern is parsed and stored in the form of a structure whose definition is given below:- typedef struct regex_t { int type; union { char ch; char *char_class; }; }regex;

Logic used:-

The text and no of patterns are taken through input
Each pattern that is input, is parsed, converted and stored in the form of the structure, defined above.
The match_here function as defined by Sir in the class, has been converted to an iterative version called match_util which iterates through the pattern and text and returns the starting index in case a match is found. -1 otherwise.
The match length is in fact a global variable which indicates the no. of characters matched. This is used to calculate the ending index, as follows: End index = Start index + match length - 1
The parsed pattern and the character class buffers are also global variables for convenience of scope (Although a bad programming practice)
The macros and metacharacters are depicted by their types, which are ENUMS for ease of use, easy debugging and readability.

Input:- From stdin, as follows

text

no_of_patterns

pattern0

pattern1

Constraints:-

Max length of text = 4000

Max length of patterns = 1000

Output:- As required by the assignment, this code Outputs:-

0 - if there is no match

1 start_index end_index - if match is found

for each pattern input